Hugging Face Models
Your trading agents have access to 97 pre-cached machine learning models from HuggingFace. These models are downloaded and cached during build time, so your agent loads them instantly without network requests.
We pin each model to a specific commit SHA to ensure your backtest results stay consistent across runs and to support our Foundation Model Explainability Layer (FMEL), which tracks exactly which model versions influenced trading decisions. While we store commit SHAs internally, you can use standard identifiers like main in your code—our download scripts handle the refs mapping automatically.
Usage
Load any model using the standard HuggingFace transformers library:
from transformers import AutoModel, AutoTokenizer
# Load a sentiment model
tokenizer = AutoTokenizer.from_pretrained("ProsusAI/finbert")
model = AutoModel.from_pretrained("ProsusAI/finbert")
# Use for inference
inputs = tokenizer("Markets rallied on strong earnings", return_tensors="pt")
outputs = model(**inputs)
Time Series Forecasting (9)
Predict future prices, volumes, and market indicators from historical data.
-
amazon/chronos-t5-tiny Ultra-fast time series forecasting for high-frequency price predictions with minimal latency
AI Summary Chronos is a family of pretrained time series forecasting models based on language model architectures. A time series is transformed into a sequence of tokens via scaling and quantization, and a language model is trained on these tokens using the cross-entropy loss. Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context. Chronos models have been trained on a large corpus of publicly available time series data, as well as...
Available Tags: mainCached Revisions:29d808298f1a62493e7b9a5e08529d0d930fa189
-
amazon/chronos-t5-mini Lightweight time series forecasting for intraday price and volume predictions
AI Summary Chronos is a family of pretrained time series forecasting models based on language model architectures. A time series is transformed into a sequence of tokens via scaling and quantization, and a language model is trained on these tokens using the cross-entropy loss. Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context. Chronos models have been trained on a large corpus of publicly available time series data, as well as...
Available Tags: mainCached Revisions:bd6a4fde8403b8469acd0abd52852b29dbe61c7b
-
amazon/chronos-t5-small Balanced time series forecasting for daily price trends and market indicators
AI Summary Chronos is a family of pretrained time series forecasting models based on language model architectures. A time series is transformed into a sequence of tokens via scaling and quantization, and a language model is trained on these tokens using the cross-entropy loss. Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context. Chronos models have been trained on a large corpus of publicly available time series data, as well as...
Available Tags: mainCached Revisions:a971ba21945c4f1796b17a91fe69214b5f4ad472
-
amazon/chronos-t5-large High-capacity time series forecasting for complex market pattern recognition
AI Summary Chronos is a family of pretrained time series forecasting models based on language model architectures. A time series is transformed into a sequence of tokens via scaling and quantization, and a language model is trained on these tokens using the cross-entropy loss. Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context. Chronos models have been trained on a large corpus of publicly available time series data, as well as...
Available Tags: mainCached Revisions:0e46c9c7e2e9f74b53db0617fdfcfe42a413e54a
-
amazon/chronos-bolt-tiny Optimized fast forecasting for real-time trading signal generation
AI Summary Chronos-Bolt is a family of pretrained time series forecasting models which can be used for zero-shot forecasting. It is based on the T5 encoder-decoder architecture and has been trained on nearly 100 billion time series observations. It chunks the historical time series context into patches of multiple observations, which are then input into the encoder. The decoder then uses these representations to directly generate quantile forecasts across multiple future steps—a method known as direct...
Available Tags: mainCached Revisions:a0e552de83495b5c28c14c71c374f3e33280b340
-
amazon/chronos-bolt-base Production-grade forecasting with balanced speed and accuracy for live trading
AI Summary Chronos-Bolt is a family of pretrained time series forecasting models which can be used for zero-shot forecasting. It is based on the T5 encoder-decoder architecture and has been trained on nearly 100 billion time series observations. It chunks the historical time series context into patches of multiple observations, which are then input into the encoder. The decoder then uses these representations to directly generate quantile forecasts across multiple future steps—a method known as direct...
Available Tags: mainCached Revisions:5d9f166d69f47aef3401367a7b842e78fe97b121
-
amazon/chronos-t5-base Robust time series forecasting for multi-day price patterns and trend analysis
AI Summary Chronos is a family of pretrained time series forecasting models based on language model architectures. A time series is transformed into a sequence of tokens via scaling and quantization, and a language model is trained on these tokens using the cross-entropy loss. Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context. Chronos models have been trained on a large corpus of publicly available time series data, as well as...
Available Tags: mainCached Revisions:ad294eaacead15db499b740ea4122266dd2a81a2
-
amazon/chronos-bolt-small Efficient forecasting for streaming price data and quick predictions
AI Summary Chronos-Bolt is a family of pretrained time series forecasting models which can be used for zero-shot forecasting. It is based on the T5 encoder-decoder architecture and has been trained on nearly 100 billion time series observations. It chunks the historical time series context into patches of multiple observations, which are then input into the encoder. The decoder then uses these representations to directly generate quantile forecasts across multiple future steps—a method known as direct...
Available Tags: mainCached Revisions:772f3d25d38aec6d914c8949dab4462e2d46f5d8
-
time-series-foundation-models/Lag-Llama Probabilistic forecasting with uncertainty quantification for risk-aware trading
AI Summary Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting Lag-Llama is the first open-source foundation model for time series forecasting! [Model Weights] [Colab Demo 1: Zero-Shot Forecasting] [Colab Demo 2: (Preliminary Finetuning)] 16-Apr-2024: Released pretraining and finetuning scripts to replicate the experiments in the paper. See Reproducing Experiments in the Paper for details. 9-Apr-2024: We have released a 15-minute video 🎥 on Lag-Llama on YouTube. 5-Apr-2024:...
Available Tags: mainCached Revisions:72dcfc29da106acfe38250a60f4ae29d1e56a3d9
Financial Sentiment Analysis (7)
Analyze sentiment in financial news, earnings reports, and social media.
-
ProsusAI/finbert Analyze sentiment in financial news and earnings reports to generate trading signals
AI Summary FinBERT is a pre-trained NLP model to analyze sentiment of financial text. It is built by further training the BERT language model in the finance domain, using a large financial corpus and thereby fine-tuning it for financial sentiment classification. Financial PhraseBank by Malo et al. (2014) is used for fine-tuning. For more details, please see the paper FinBERT: Financial Sentiment Analysis with Pre-trained Language Models and our related blog post on Medium. The model will give softmax...
Available Tags: mainCached Revisions:4556d13015211d73dccd3fdd39d39232506f3e43
-
yiyanghkust/finbert-tone Detect positive, negative, neutral tone in financial communications
AI Summary is a BERT model pre-trained on financial communication text. The purpose is to enhance financial NLP research and practice. It is trained on the following three financial communication corpus. The total corpora size is 4.9B tokens. - Corporate Reports 10-K & 10-Q: 2.5B tokens - Earnings Call Transcripts: 1.3B tokens - Analyst Reports: 1.1B tokens This released model is the model fine-tuned on 10,000 manually annotated (positive, negative, neutral) sentences from analyst reports. This model...
Available Tags: mainCached Revisions:4921590d3c0c3832c0efea24c8381ce0bda7844b
-
cardiffnlp/twitter-roberta-base-sentiment-latest Social media sentiment analysis for market mood detection
AI Summary Twitter-roBERTa-base for Sentiment Analysis - UPDATED (2022) - Reference Paper: TimeLMs paper. - Git Repo: TimeLMs official repository. Labels: 0 -> Negative; 1 -> Neutral; 2 -> Positive This sentiment analysis model has been integrated into TweetNLP. You can access the demo here.
Available Tags: mainCached Revisions:3216a57f2a0d9c45a2e6c20157c20c49fb4bf9c7
-
StephanAkkerman/FinTwitBERT-sentiment Analyze sentiment from financial Twitter/X posts for social trading signals
AI Summary FinTwitBERT-sentiment is a finetuned model for classifying the sentiment of financial tweets. It uses FinTwitBERT as a base model, which has been pre-trained on 10 million financial tweets. This approach ensures that the FinTwitBERT-sentiment has seen enough financial tweets, which have an informal nature, compared to other financial texts, such as news headlines. Therefore this model performs great on informal financial texts, seen on social media. FinTwitBERT-sentiment is intended for...
Available Tags: mainCached Revisions:da059da3b3bbcb43f9ed1aeb5ae61644010c7e1e
-
Sigma/financial-sentiment-analysis General-purpose financial sentiment for news and social media analysis
AI Summary This model is a fine-tuned version of ahmedrachid/FinancialBERT on the financial_phrasebank dataset. It achieves the following results on the evaluation set: - Loss: 0.0395 - Accuracy: 0.9924 - F1: 0.9924 The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 32 - eval_batch_size: 32 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 5
Available Tags: mainCached Revisions:d78ca172e07e94390f615739cee98a2154381f7e
-
ahmedrachid/FinancialBERT-Sentiment-Analysis Sentiment classification for financial text including SEC filings and analyst reports
AI Summary FinancialBERT is a BERT model pre-trained on a large corpora of financial texts. The purpose is to enhance financial NLP research and practice in financial domain, hoping that financial practitioners and researchers can benefit from this model without the necessity of the significant computational resources required to train the model. The model was fine-tuned for Sentiment Analysis task on _Financial PhraseBank_ dataset. Experiments show that this model outperforms the general BERT and other...
Available Tags: mainCached Revisions:656931965473ec085d195680bd62687b140c038f
-
mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis Fast sentiment analysis for high-volume financial news processing
AI Summary This model is a fine-tuned version of distilroberta-base on the financial_phrasebank dataset. It achieves the following results on the evaluation set: - Loss: 0.1116 - Accuracy: 0.9823 This model is a distilled version of the RoBERTa-base model. It follows the same training procedure as DistilBERT. The code for the distillation process can be found here. This model is case-sensitive: it makes a difference between English and English. The model has 6 layers, 768 dimension and 12 heads,...
Available Tags: mainCached Revisions:ae0eab9ad336d7d548e0efe394b07c04bcaf6e91
Text Embeddings (12)
Convert text to vectors for semantic search and document similarity.
-
sentence-transformers/all-MiniLM-L12-v2 Balanced speed and quality for production embedding pipelines
AI Summary all-MiniLM-L12-v2 This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search. The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised contrastive learning objective. We used the pretrained model and fine-tuned in on a 1B sentence pairs dataset. We use a contrastive learning objective: given a sentence from the pair, the...
Available Tags: mainCached Revisions:c004d8e3e901237d8fa7e9fff12774962e391ce5
-
sentence-transformers/all-MiniLM-L6-v2 Ultra-fast embeddings for real-time document clustering and search
AI Summary all-MiniLM-L6-v2 This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search. The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised contrastive learning objective. We used the pretrained model and fine-tuned in on a 1B sentence pairs dataset. We use a contrastive learning objective: given a sentence from the pair, the model...
Available Tags: mainCached Revisions:c9745ed1d9f207416be6d2e6f8de32d1f16199bf
-
sentence-transformers/all-mpnet-base-v2 High-quality sentence embeddings for semantic similarity in research
AI Summary all-mpnet-base-v2 This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. Text Embeddings Inference (TEI) is a blazing fast inference solution for text embedding models. Send a request to to generate embeddings via the OpenAI Embeddings API: Or check the Text Embeddings Inference API specification instead. The project aims to train sentence embedding models on very large sentenc...
Available Tags: mainCached Revisions:e8c3b32edf5434bc2275fc9bab85f82640a19130
-
BAAI/bge-m3 Multilingual embeddings for international news and cross-market analysis
AI Summary In this project, we introduce BGE-M3, which is distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity. - Multi-Functionality: It can simultaneously perform the three common retrieval functionalities of embedding model: dense retrieval, multi-vector retrieval, and sparse retrieval. - Multi-Linguality: It can support more than 100 working languages. - Multi-Granularity: It is able to process inputs of different granularities, spanning from short...
Available Tags: mainCached Revisions:5617a9f61b028005a4858fdac845db406aefb181
-
BAAI/bge-base-en-v1.5 Balanced embeddings for document similarity and RAG-based market research
AI Summary If you are looking for a model that supports more languages, longer texts, and other retrieval methods, you can try using bge-m3. FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently: - Long-Context LLM: Activation Beacon - Fine-tuning of LM : LM-Cocktail - Dense Retrieval: BGE-M3, LLM Embedder, BGE Embedding - Reranker Model: BGE Reranker - Benchmark: C-MTEB News - 1/30/2024: Release BGE-M3, a new member to BGE model series! M3 stands for...
Available Tags: mainCached Revisions:a5beb1e3e68b9ab74eb54cfd186867f64f240e1a
-
sentence-transformers/paraphrase-MiniLM-L6-v2 Detect paraphrased content and duplicate news for deduplication
AI Summary sentence-transformers/paraphrase-MiniLM-L6-v2 Using this model becomes easy when you have sentence-transformers installed: This model was trained by sentence-transformers. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:
Available Tags: mainCached Revisions:c9a2bfebc254878aee8c3aca9e6844d5bbb102d1
-
BAAI/bge-small-en-v1.5 Fast semantic search for finding similar news articles and market patterns
AI Summary If you are looking for a model that supports more languages, longer texts, and other retrieval methods, you can try using bge-m3. FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently: - Long-Context LLM: Activation Beacon - Fine-tuning of LM : LM-Cocktail - Dense Retrieval: BGE-M3, LLM Embedder, BGE Embedding - Reranker Model: BGE Reranker - Benchmark: C-MTEB News - 1/30/2024: Release BGE-M3, a new member to BGE model series! M3 stands for...
Available Tags: mainCached Revisions:5c38ec7c405ec4b44b94cc5a9bb96e735b38267a
-
sentence-transformers/multi-qa-MiniLM-L6-cos-v1 Question-answering embeddings for financial document Q&A systems
AI Summary multi-qa-MiniLM-L6-cos-v1 This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and was designed for semantic search. It has been trained on 215M (question, answer) pairs from diverse sources. For an introduction to semantic search, have a look at: SBERT.net - Semantic Search In the following some technical details how this model must be used: Note: When loaded with , this model produces normalized embeddings with length 1. In that case,...
Available Tags: mainCached Revisions:b207367332321f8e44f96e224ef15bc607f4dbf0
-
BAAI/bge-large-en-v1.5 High-quality embeddings for complex semantic matching in financial documents
AI Summary If you are looking for a model that supports more languages, longer texts, and other retrieval methods, you can try using bge-m3. FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently: - Long-Context LLM: Activation Beacon - Fine-tuning of LM : LM-Cocktail - Dense Retrieval: BGE-M3, LLM Embedder, BGE Embedding - Reranker Model: BGE Reranker - Benchmark: C-MTEB News - 1/30/2024: Release BGE-M3, a new member to BGE model series! M3 stands for...
Available Tags: mainCached Revisions:d4aa6901d3a41ba39fb536a557fa166f842b0e09
-
thenlper/gte-base General text embeddings for document retrieval and similarity
AI Summary General Text Embeddings (GTE) model. Towards General Text Embeddings with Multi-stage Contrastive Learning The GTE models are trained by Alibaba DAMO Academy. They are mainly based on the BERT framework and currently offer three different sizes of models, including GTE-large, GTE-base, and GTE-small. The GTE models are trained on a large-scale corpus of relevance text pairs, covering a wide range of domains and scenarios. This enables the GTE models to be applied to various downstream tasks o...
Available Tags: mainCached Revisions:c078288308d8dee004ab72c6191778064285ec0c
-
thenlper/gte-small Efficient text embeddings for building financial knowledge bases
AI Summary General Text Embeddings (GTE) model. Towards General Text Embeddings with Multi-stage Contrastive Learning The GTE models are trained by Alibaba DAMO Academy. They are mainly based on the BERT framework and currently offer three different sizes of models, including GTE-large, GTE-base, and GTE-small. The GTE models are trained on a large-scale corpus of relevance text pairs, covering a wide range of domains and scenarios. This enables the GTE models to be applied to various downstream tasks o...
Available Tags: mainCached Revisions:17e1f347d17fe144873b1201da91788898c639cd
-
nomic-ai/nomic-embed-text-v1.5 Long-context embeddings for processing lengthy financial reports
AI Summary nomic-embed-text-v1.5: Resizable Production Embeddings with Matryoshka Representation Learning Exciting Update!: is now multimodal! nomic-embed-vision-v1.5 is aligned to the embedding space of , meaning any text embedding is multimodal! Important: the text prompt must include a task instruction prefix, instructing the model which task is being performed. For example, if you are implementing a RAG application, you embed your documents as and embed your user queries as . Purpose: embed texts as...
Available Tags: mainCached Revisions:e5cf08aadaa33385f5990def41f7a23405aec398
Entity Recognition (6)
Extract companies, people, locations, and other entities from text.
-
dbmdz/bert-large-cased-finetuned-conll03-english Identify organizations and key figures in earnings reports
AI Summary HuggingFace model for token-classification
Available Tags: mainCached Revisions:4c534963167c08d4b8ff1f88733cf2930f86add0
-
Jean-Baptiste/roberta-large-ner-english RoBERTa-based entity recognition for robust extraction in noisy text
AI Summary roberta-large-ner-english: model fine-tuned from roberta-large for NER task [roberta-large-ner-english] is an english NER model that was fine-tuned from roberta-large on conll2003 dataset. Model was validated on emails/chat data and outperformed other models on this type of data specifically. In particular the model seems to work better on entity that don't start with an upper case. O |Outside of a named entity MISC |Miscellaneous entity PER |Person's name ORG |Organization LOC |Location In...
Available Tags: mainCached Revisions:8f3abc1ef81ffbbb0e80568d4fed1dd10d459548
-
dslim/bert-base-NER Extract company names, people, and locations from financial news
AI Summary bert-base-NER is a fine-tuned BERT model that is ready to use for Named Entity Recognition and achieves state-of-the-art performance for the NER task. It has been trained to recognize four types of entities: location (LOC), organizations (ORG), person (PER) and Miscellaneous (MISC). Specifically, this model is a bert-base-cased model that was fine-tuned on the English version of the standard CoNLL-2003 Named Entity Recognition dataset. If you'd like to use a larger BERT-large model fine-tuned...
Available Tags: mainCached Revisions:d1a3e8f13f8c3566299d95fcfc9a8d2382a9affc
-
dslim/bert-large-NER High-accuracy entity extraction for detailed news analysis
AI Summary bert-large-NER is a fine-tuned BERT model that is ready to use for Named Entity Recognition and achieves state-of-the-art performance for the NER task. It has been trained to recognize four types of entities: location (LOC), organizations (ORG), person (PER) and Miscellaneous (MISC). Specifically, this model is a bert-large-cased model that was fine-tuned on the English version of the standard CoNLL-2003 Named Entity Recognition dataset. If you'd like to use a smaller BERT model fine-tuned on...
Available Tags: mainCached Revisions:6fe43d9ec0bba0f67e367ecd74399216fc409c7f
-
flair/ner-english-ontonotes-large Fine-grained entity recognition including monetary values and dates
AI Summary English NER in Flair (Ontonotes large model) This is the large 18-class NER model for English that ships with Flair. Based on document-level XLM-R embeddings and FLERT. So, the entities "September 1st" (labeled as a date), "George" (labeled as a person), "1 dollar" (labeled as a money) and "Game of Thrones" (labeled as a work of art) are found in the sentence "On September 1st George Washington won 1 dollar while watching Game of Thrones". The following Flair script was used to trai...
Available Tags: mainCached Revisions:4ffb3596f4359f0c8799ea15bbf5dbb3b0915a53
-
xlm-roberta-large-finetuned-conll03-english Multilingual entity recognition for international financial news processing
AI Summary xlm-roberta-large-finetuned-conll03-english 1. Model Details 2. Uses 3. Bias, Risks, and Limitations 4. Training 5. Evaluation 6. Environmental Impact 7. Technical Specifications 8. Citation 9. Model Card Authors 10. How To Get Started With the Model The XLM-RoBERTa model was proposed in Unsupervised Cross-lingual Representation Learning at Scale by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoye...
Available Tags: mainCached Revisions:18f95e9924f3f452df09cc90945073906ef18f1e
Text Classification (6)
Categorize text into topics without custom training data.
-
joeddav/xlm-roberta-large-xnli Cross-lingual text classification for international market news
AI Summary This model takes xlm-roberta-large and fine-tunes it on a combination of NLI data in 15 languages. It is intended to be used for zero-shot text classification, such as with the Hugging Face ZeroShotClassificationPipeline. This model is intended to be used for zero-shot text classification, especially in languages other than English. It is fine-tuned on XNLI, which is a multilingual NLI dataset. The model can therefore be used with any of the languages in the XNLI corpus: - English - French -...
Available Tags: mainCached Revisions:b227ee8435ceadfa86dc1368a34254e2838bf242
-
facebook/bart-large-mnli Zero-shot classification for categorizing news without training data
AI Summary This is the checkpoint for bart-large after being trained on the MultiNLI (MNLI) dataset. Additional information about this model: - The bart-large model page - BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension Yin et al. proposed a method for using pre-trained NLI models as a ready-made zero-shot sequence classifiers. The method works by posing the sequence to be classified as the NLI premise and to construct a hypothesis from...
Available Tags: mainCached Revisions:d7645e127eaf1aefc7862fd59a17a5aa8558b8ce
-
MoritzLaurer/mDeBERTa-v3-base-mnli-xnli Multilingual zero-shot classification for global news categorization
AI Summary Multilingual mDeBERTa-v3-base-mnli-xnli Model description This multilingual model can perform natural language inference (NLI) on 100 languages and is therefore also suitable for multilingual zero-shot classification. The underlying model was pre-trained by Microsoft on the CC100 multilingual dataset. It was then fine-tuned on the XNLI dataset, which contains hypothesis-premise pairs from 15 languages, as well as the English MNLI dataset. As of December 2021, mDeBERTa-base is the best...
Available Tags: mainCached Revisions:8adb042d524ecd5c26d3e3ba0e3fbcf7e2d0864c
-
typeform/distilbert-base-uncased-mnli Fast zero-shot classification for high-volume news filtering
AI Summary Table of Contents - Model Details - How to Get Started With the Model - Uses - Risks, Limitations and Biases - Training - Evaluation - Environmental Impact Model Details Model Description: This is the uncased DistilBERT model fine-tuned on Multi-Genre Natural Language Inference (MNLI) dataset for the zero-shot classification task. - Developed by: The Typeform team. - Model Type: Zero-Shot Classification - Language(s): English - License: Unknown - Parent Model: See the distilbert base uncased...
Available Tags: mainCached Revisions:cfa538a0fddbbd978fefe8966c1aeff7ad409c90
-
cross-encoder/nli-deberta-v3-base Accurate text pair classification for comparing market narratives
AI Summary Cross-Encoder for Natural Language Inference This model was trained using SentenceTransformers Cross-Encoder class. This model is based on microsoft/deberta-v3-base Training Data The model was trained on the SNLI and MultiNLI datasets. For a given sentence pair, it will output three scores corresponding to the labels: contradiction, entailment, neutral. Performance - Accuracy on SNLI-test dataset: 92.38 - Accuracy on MNLI mismatched set: 90.04 For futher evaluation results, see SBERT.net -...
Available Tags: mainCached Revisions:6c749ce3425cd33b46d187e45b92bbf96ee12ec7
-
roberta-large-mnli High-quality natural language inference for content categorization
AI Summary Table of Contents - Model Details - How To Get Started With the Model - Uses - Risks, Limitations and Biases - Training - Evaluation - Environmental Impact - Technical Specifications - Citation Information - Model Card Authors Model Description: roberta-large-mnli is the RoBERTa large model fine-tuned on the Multi-Genre Natural Language Inference (MNLI) corpus. The model is a pretrained model on English language text using a masked language modeling (MLM) objective. - Developed by: See GitHub...
Available Tags: mainCached Revisions:2a8f12d27941090092df78e4ba6f0928eb5eac98
General Sentiment Analysis (6)
General-purpose sentiment analysis for any text source.
-
cardiffnlp/twitter-roberta-base-sentiment Social media sentiment for retail investor mood analysis
AI Summary Twitter-roBERTa-base for Sentiment Analysis - Reference Paper: _TweetEval_ (Findings of EMNLP 2020). - Git Repo: Tweeteval official repository. Labels: 0 -> Negative; 1 -> Neutral; 2 -> Positive New! We just released a new sentiment analysis model trained on more recent and a larger quantity of tweets. See twitter-roberta-base-sentiment-latest and TweetNLP for more details. Please cite the reference paper if you use this model.
Available Tags: mainCached Revisions:daefdd1f6ae931839bce4d0f3db0a1a4265cd50f
-
siebert/sentiment-roberta-large-english High-accuracy sentiment analysis for important market communications
AI Summary SiEBERT - English-Language Sentiment Classification Overview This model ("SiEBERT", prefix for "Sentiment in English") is a fine-tuned checkpoint of RoBERTa-large (Liu et al. 2019). It enables reliable binary sentiment analysis for various types of English-language text. For each instance, it predicts either positive (1) or negative (0) sentiment. The model was fine-tuned and evaluated on 15 data sets from diverse text sources to enhance generalization across different types of texts (rev...
Available Tags: mainCached Revisions:74cea614e245b0832c770ec9aa51bd58df965b9c
-
nlptown/bert-base-multilingual-uncased-sentiment Multilingual sentiment for global market sentiment tracking
AI Summary bert-base-multilingual-uncased-sentiment Visit the NLP Town website for an updated version of this model, with a 40% error reduction on product reviews. This model is intended for direct use as a sentiment analysis model for product reviews in any of the six languages above or for further finetuning on related sentiment analysis tasks. Here is the number of product reviews we used for finetuning the model: The fine-tuned model obtained the following accuracy on 5,000 held-out product reviews ...
Available Tags: mainCached Revisions:8f6f4e3a8f70be4b65d3a4a8762b6d781cda240d
-
finiteautomata/bertweet-base-sentiment-analysis Twitter-optimized sentiment for social trading signals
AI Summary Sentiment Analysis in English bertweet-sentiment-analysis Repository: https://github.com/finiteautomata/pysentimiento/ Model trained with SemEval 2017 corpus (around ~40k tweets). Base model is BERTweet, a RoBERTa model trained on English tweets. is an open-source library for non-commercial use and scientific research purposes only. Please be aware that models are trained with third-party datasets and are subject to their respective licenses. If you use in your work, please cite this paper
Available Tags: mainCached Revisions:924fc4c80bccb8003d21fe84dd92c7887717f245
-
SamLowe/roberta-base-go_emotions Fine-grained emotion classification for nuanced sentiment analysis
AI Summary Model trained from roberta-base on the go_emotions dataset for multi-label classification. A version of this model in ONNX format (including an INT8 quantized ONNX version) is now available at https://huggingface.co/SamLowe/roberta-base-go_emotions-onnx. These are faster for inference, esp for smaller batch sizes, massively reduce the size of the dependencies required for inference, make inference of the model more multi-platform, and in the case of the quantized version reduce the model...
Available Tags: mainCached Revisions:58b6c5b44a7a12093f782442969019c7e2982299
-
j-hartmann/emotion-english-distilroberta-base Emotion detection for understanding market fear and greed
AI Summary With this model, you can classify emotions in English text data. The model was trained on 6 diverse datasets (see Appendix below) and predicts Ekman's 6 basic emotions, plus a neutral class: 1) anger 🤬 2) disgust 🤢 3) fear 😨 4) joy 😀 5) neutral 😐 6) sadness 😭 7) surprise 😲 The model is a fine-tuned checkpoint of DistilRoBERTa-base. For a 'non-distilled' emotion model, please refer to the model card of the RoBERTa-large version. a) Run emotion model with 3 lines of code on single text example...
Available Tags: mainCached Revisions:0e1cd914e3d46199ed785853e12b57304e04178b
Question Answering (6)
Extract specific answers from documents like earnings calls and SEC filings.
-
deepset/roberta-base-squad2 Extract specific answers from earnings calls and financial reports
AI Summary This is the roberta-base model, fine-tuned using the SQuAD2.0 dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Extractive Question Answering. We have also released a distilled version of this model called deepset/tinyroberta-squad2. It has a comparable prediction quality and runs at twice the speed of deepset/roberta-base-squad2. Overview Language model: roberta-base Language: English Downstream-task: Extractive QA Training data:...
Available Tags: mainCached Revisions:adc3b06f79f797d1c575d5479d6f5efe54a9e3b4
-
deepset/deberta-v3-base-squad2 High-accuracy Q&A for complex financial document analysis
AI Summary This is the deberta-v3-base model, fine-tuned using the SQuAD2.0 dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Question Answering. Overview Language model: deberta-v3-base Language: English Downstream-task: Extractive QA Training data: SQuAD 2.0 Eval data: SQuAD 2.0 Code: See an example extractive QA pipeline built with Haystack Infrastructure: 1x NVIDIA A10G In Haystack Haystack is an AI orchestration framework to build...
Available Tags: mainCached Revisions:eea39c60cc305c2e4a9504f5ff117294bebb42db
-
deepset/tinyroberta-squad2 Ultra-lightweight Q&A for edge deployment and mobile apps
AI Summary This is the distilled version of the deepset/roberta-base-squad2 model. This model has a comparable prediction quality and runs at twice the speed of the base model. Overview Language model: tinyroberta-squad2 Language: English Downstream-task: Extractive QA Training data: SQuAD 2.0 Eval data: SQuAD 2.0 Code: See an example extractive QA pipeline built with Haystack Infrastructure: 4x Tesla v100 Distillation This model was distilled using the TinyBERT approach described in this...
Available Tags: mainCached Revisions:12b287c9df677e28b07f0a023850dba68c997dbf
-
deepset/minilm-uncased-squad2 Efficient Q&A balancing speed and accuracy for production
AI Summary MiniLM-L12-H384-uncased for Extractive QA Overview Language model: microsoft/MiniLM-L12-H384-uncased Language: English Downstream-task: Extractive QA Training data: SQuAD 2.0 Eval data: SQuAD 2.0 Code: See an example extractive QA pipeline built with Haystack Infrastructure: 1x Tesla v100 In Haystack Haystack is an AI orchestration framework to build customizable, production-ready LLM applications. You can use this model in Haystack to do extractive question answering on documents....
Available Tags: mainCached Revisions:934656cdda79824eabf503ed56e15c01ddbdbe3f
-
distilbert-base-cased-distilled-squad Fast Q&A for real-time document querying
AI Summary Table of Contents - Model Details - How To Get Started With the Model - Uses - Risks, Limitations and Biases - Training - Evaluation - Environmental Impact - Technical Specifications - Citation Information - Model Card Authors Model Description: The DistilBERT model was proposed in the blog post Smaller, faster, cheaper, lighter: Introducing DistilBERT, adistilled version of BERT, and the paper DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter. DistilBERT is a small...
Available Tags: mainCached Revisions:564e9b582944a57a3e586bbb98fd6f0a4118db7f
-
bert-large-uncased-whole-word-masking-finetuned-squad Robust Q&A for extracting key figures from SEC filings
AI Summary BERT large model (uncased) whole word masking finetuned on SQuAD Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is uncased: it does not make a difference between english and English. Differently to other BERT models, this model was trained with a new technique: Whole Word Masking. In this case, all of the tokens corresponding to a word are masked at once. The overall maski...
Available Tags: mainCached Revisions:979de3ccf2f366b17c326254262eff51aec29d62
Text Summarization (6)
Condense long documents into concise summaries.
-
facebook/bart-large-cnn Summarize news articles and earnings call transcripts
AI Summary BART (large-sized model), fine-tuned on CNN Daily Mail BART model pre-trained on English language, and fine-tuned on CNN Daily Mail. It was introduced in the paper BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Lewis et al. and first released in this repository (https://github.com/pytorch/fairseq/tree/master/examples/bart). Disclaimer: The team releasing BART did not write a model card for this model so this model card has...
Available Tags: mainCached Revisions:37f520fa929c961707657b28798b30c003dd100b
-
google/pegasus-xsum Generate concise one-sentence summaries of market news
AI Summary Authors: Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu on Dec 18, 2019 The following is copied from the authors' README. We train a pegasus model with sampled gap sentence ratios on both C4 and HugeNews, and stochastically sample important sentences. The updated the results are reported in this table. The "Mixed & Stochastic" model has the following changes: - trained on both C4 and HugeNews (dataset mixture is weighted by their number of examples). - trained for 1.5M instead of...
Available Tags: mainCached Revisions:8d8ffc158a3bee9fbb03afacdfc347c823c5ec8b
-
philschmid/bart-large-cnn-samsum Summarize conversational content like analyst calls and interviews
AI Summary > If you want to use the model you should try a newer fine-tuned FLAN-T5 version philschmid/flan-t5-base-samsum out socring the BART version with on achieving . This model was trained using Amazon SageMaker and the new Hugging Face Deep Learning container.
Available Tags: mainCached Revisions:e49b3d60d923f12db22bdd363356f1a4c68532ad
-
sshleifer/distilbart-cnn-12-6 Fast summarization for high-volume news processing pipelines
AI Summary This checkpoint should be loaded into . See the BART docs for more information.
Available Tags: mainCached Revisions:a4f8f3ea906ed274767e9906dbaede7531d660ff
-
google/flan-t5-base Instruction-tuned model for flexible text generation and summarization
AI Summary 0. TL;DR 1. Model Details 2. Usage 3. Uses 4. Bias, Risks, and Limitations 5. Training Details 6. Evaluation 7. Environmental Impact 8. Citation 9. Model Card Authors If you already know T5, FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages. As mentioned in the first few lines of the abstract : > Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks,...
Available Tags: mainCached Revisions:7bcac572ce56db69c1ea7c8af255c5d7c9672fc2
-
t5-small Lightweight text-to-text model for various NLP tasks
AI Summary 1. Model Details 2. Uses 3. Bias, Risks, and Limitations 4. Training Details 5. Evaluation 6. Environmental Impact 7. Citation 8. Model Card Authors 9. How To Get Started With the Model The developers of the Text-To-Text Transfer Transformer (T5) write: > With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our...
Available Tags: mainCached Revisions:df1b051c49625cf57a3d0d8d3863ed4d13564fe4
Language Translation (6)
Translate financial news between languages for global market coverage.
-
Helsinki-NLP/opus-mt-en-de Translate German financial news and ECB communications
AI Summary Table of Contents - Model Details - Uses - Risks, Limitations and Biases - Training - Evaluation - Citation Information - How to Get Started With the Model Model Details Model Description: - Developed by: Language Technology Research Group at the University of Helsinki - Model Type: Translation - Language(s): - Source Language: English - Target Language: German - License: CC-BY-4.0 - Resources for more information: - GitHub Repo This model can be used for translation and text-to-text...
Available Tags: mainCached Revisions:6183067f769a302e3861815543b9f312c71b0ca4
-
Helsinki-NLP/opus-mt-en-zh Translate Chinese financial news for Asian market analysis
AI Summary source group: English target group: Chinese OPUS readme: eng-zho model: transformer source language(s): eng target language(s): cjy_Hans cjy_Hant cmn cmn_Hans cmn_Hant gan lzh lzh_Hans nan wuu yue yue_Hans yue_Hant model: transformer pre-processing: normalization + SentencePiece (spm32k,spm32k) a sentence initial language token is required in the form of (id = valid target language ID) download original weights: opus-2020-07-17.zip test set translations: opus-2020-07-17.test.txt ...
Available Tags: mainCached Revisions:408d9bc410a388e1d9aef112a2daba955b945255
-
Helsinki-NLP/opus-mt-en-fr Translate French financial news and EU market reports
AI Summary source languages: en target languages: fr OPUS readme: en-fr dataset: opus model: transformer-align pre-processing: normalization + SentencePiece download original weights: opus-2020-02-26.zip test set translations: opus-2020-02-26.test.txt * test set scores: opus-2020-02-26.eval.txt
Available Tags: mainCached Revisions:dd7f6540a7a48a7f4db59e5c0b9c42c8eea67f18
-
Helsinki-NLP/opus-mt-en-es Translate Spanish financial news from Latin American markets
AI Summary source group: English target group: Spanish OPUS readme: eng-spa model: transformer source language(s): eng target language(s): spa model: transformer pre-processing: normalization + SentencePiece (spm32k,spm32k) download original weights: opus-2020-08-18.zip test set translations: opus-2020-08-18.test.txt * test set scores: opus-2020-08-18.eval.txt - opus_readme_url: https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/eng-spa/README.md - prepro: normalization +...
Available Tags: mainCached Revisions:5bc4493d463cf000c1f0b50f8d56886a392ed4ab
-
facebook/mbart-large-50-many-to-many-mmt Multi-language translation for comprehensive global news coverage
AI Summary mBART-50 many to many multilingual machine translation This model is a fine-tuned checkpoint of mBART-large-50. is fine-tuned for multilingual machine translation. It was introduced in Multilingual Translation with Extensible Multilingual Pretraining and Finetuning paper. The model can translate directly between any pair of 50 languages. To translate into a target language, the target language id is forced as the first generated token. To force the target language id as the first generated...
Available Tags: mainCached Revisions:e30b6cb8eb0d43a0b73cab73c7676b9863223a30
-
Helsinki-NLP/opus-mt-ja-en Translate Japanese financial news and BOJ communications to English
AI Summary source languages: ja target languages: en OPUS readme: ja-en dataset: opus model: transformer-align pre-processing: normalization + SentencePiece download original weights: opus-2019-12-18.zip test set translations: opus-2019-12-18.test.txt * test set scores: opus-2019-12-18.eval.txt
Available Tags: mainCached Revisions:0770961a39ba6bd66305b149c3f4110bcafca2e6
Efficient & Lightweight (10)
Smaller models optimized for speed and memory efficiency.
-
huawei-noah/TinyBERT_General_4L_312D Ultra-compact BERT for resource-constrained deployments
AI Summary TinyBERT: Distilling BERT for Natural Language Understanding ======== TinyBERT is 7.5x smaller and 9.4x faster on inference than BERT-base and achieves competitive performances in the tasks of natural language understanding. It performs a novel transformer distillation at both the pre-training and task-specific learning stages. In general distillation, we use the original BERT-base without fine-tuning as the teacher and a large-scale text corpus as the learning data. By performing the...
Available Tags: mainCached Revisions:34707a33cd59a94ecde241ac209bf35103691b43
-
distilbert-base-cased Case-sensitive efficient NLP for proper noun recognition
AI Summary Model Card for DistilBERT base model (cased) This model is a distilled version of the BERT base model. It was introduced in this paper. The code for the distillation process can be found here. This model is cased: it does make a difference between english and English. All the training details on the pre-training, the uses, limitations and potential biases (included below) are the same as for DistilBERT-base-uncased. We highly encourage to check it if you want to know more. DistilBERT is a...
Available Tags: mainCached Revisions:6ea81172465e8b0ad3fddeed32b986cdcdcffcf0
-
google/mobilebert-uncased Mobile-optimized BERT for edge devices and mobile trading apps
AI Summary MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices MobileBERT is a thin version of BERT_LARGE, while equipped with bottleneck structures and a carefully designed balance between self-attentions and feed-forward networks. This checkpoint is the original MobileBert Optimized Uncased English: uncased_L-24_H-128_B-512_A-4_F-4_OPT checkpoint.
Available Tags: mainCached Revisions:1f90a6c24c7879273a291d34a849033eba2dbc0f
-
distilbert-base-uncased Fast general NLP tasks with 60% of BERT performance at 40% size
AI Summary This model is a distilled version of the BERT base model. It was introduced in this paper. The code for the distillation process can be found here. This model is uncased: it does not make a difference between english and English. DistilBERT is a transformers model, smaller and faster than BERT, which was pretrained on the same corpus in a self-supervised fashion, using the BERT base model as a teacher. This means it was pretrained on the raw texts only, with no humans labelling them in any wa...
Available Tags: mainCached Revisions:12040accade4e8a0f71eabdb258fecc2e7e948be
-
distilbert-base-uncased-finetuned-sst-2-english Pre-trained sentiment classifier ready for immediate use
AI Summary Table of Contents - Model Details - How to Get Started With the Model - Uses - Risks, Limitations and Biases - Training Model Details Model Description: This model is a fine-tune checkpoint of DistilBERT-base-uncased, fine-tuned on SST-2. This model reaches an accuracy of 91.3 on the dev set (for comparison, Bert bert-base-uncased version reaches an accuracy of 92.7). - Developed by: Hugging Face - Model Type: Text Classification - Language(s): English - License: Apache-2.0 - Parent Model: Fo...
Available Tags: mainCached Revisions:714eb0fa89d2f80546fda750413ed43d93601a13
-
huawei-noah/TinyBERT_General_6L_768D Balanced tiny model with good accuracy for production use
AI Summary HuggingFace model: huawei-noah/TinyBERT_General_6L_768D
Available Tags: mainCached Revisions:8b6152f3be8ab89055dea2d040cebb9591d97ef6
-
google/electra-small-discriminator Efficient pre-trained model for text classification tasks
AI Summary ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators ELECTRA is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a GAN. At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large sca...
Available Tags: mainCached Revisions:fa8239aadc095e9164941d05878b98afe9b953c3
-
distilroberta-base Efficient RoBERTa for fast inference in production pipelines
AI Summary 1. Model Details 2. Uses 3. Bias, Risks, and Limitations 4. Training Details 5. Evaluation 6. Environmental Impact 7. Citation 8. How To Get Started With the Model This model is a distilled version of the RoBERTa-base model. It follows the same training procedure as DistilBERT. The code for the distillation process can be found here. This model is case-sensitive: it makes a difference between english and English. The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters...
Available Tags: mainCached Revisions:fb53ab8802853c8e4fbdbcd0529f21fc6f459b2b
-
albert-base-v2 Parameter-efficient model with cross-layer sharing for memory savings
AI Summary Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model, as all ALBERT models, is uncased: it does not make a difference between english and English. Disclaimer: The team releasing ALBERT did not write a model card for this model so this model card has been written by the Hugging Face team. ALBERT is a transformers model pretrained on a large corpus of English data in a...
Available Tags: mainCached Revisions:8e2f239c5f8a2c0f253781ca60135db913e5c80c
-
albert-large-v2 Larger ALBERT variant balancing efficiency and performance
AI Summary Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model, as all ALBERT models, is uncased: it does not make a difference between english and English. Disclaimer: The team releasing ALBERT did not write a model card for this model so this model card has been written by the Hugging Face team. ALBERT is a transformers model pretrained on a large corpus of English data in a...
Available Tags: mainCached Revisions:dfed3a5ef4499fb3351c4ebbcf487375d1e942c8
Foundation Models (8)
Pre-trained models ready for fine-tuning on your specific tasks.
-
bert-base-uncased Foundation model for fine-tuning on custom financial tasks
AI Summary Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is uncased: it does not make a difference between english and English. Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by the Hugging Face team. BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means...
Available Tags: mainCached Revisions:86b5e0934494bd15c9632b12f734a8a67f723594
-
microsoft/deberta-v3-base Advanced BERT variant with disentangled attention for better accuracy
AI Summary DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing DeBERTa improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data. In DeBERTa V3, we further improved the efficiency of DeBERTa using ELECTRA-Style pre-training with Gradient Disentangled Embedding Sharing. Compared to DeBERTa, our V3 version...
Available Tags: mainCached Revisions:8ccc9b6f36199bec6961081d44eb72fb3f7353f3
-
bert-base-cased Case-sensitive BERT for tasks where capitalization matters
AI Summary Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is case-sensitive: it makes a difference between english and English. Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by the Hugging Face team. BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means...
Available Tags: mainCached Revisions:cd5ef92a9fb2f889e972770a36d4ed042daf221e
-
roberta-base Robustly optimized BERT for improved downstream performance
AI Summary Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is case-sensitive: it makes a difference between english and English. Disclaimer: The team releasing RoBERTa did not write a model card for this model so this model card has been written by the Hugging Face team. RoBERTa is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This...
Available Tags: mainCached Revisions:e2da8e2f811d1448a5b465c236feacd80ffbac7b
-
microsoft/deberta-v3-large High-performance DeBERTa for complex NLU tasks
AI Summary DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing DeBERTa improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data. In DeBERTa V3, we further improved the efficiency of DeBERTa using ELECTRA-Style pre-training with Gradient Disentangled Embedding Sharing. Compared to DeBERTa, our V3 version...
Available Tags: mainCached Revisions:64a8c8eab3e352a784c658aef62be1662607476f
-
bert-large-uncased High-capacity BERT for complex financial NLP tasks
AI Summary Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is uncased: it does not make a difference between english and English. Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by the Hugging Face team. BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means...
Available Tags: mainCached Revisions:6da4b6a26a1877e173fca3225479512db81a5e5b
-
roberta-large Large RoBERTa for state-of-the-art text understanding
AI Summary Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is case-sensitive: it makes a difference between english and English. Disclaimer: The team releasing RoBERTa did not write a model card for this model so this model card has been written by the Hugging Face team. RoBERTa is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This...
Available Tags: mainCached Revisions:722cf37b1afa9454edce342e7895e588b6ff1d59
-
xlnet-base-cased Permutation-based model capturing bidirectional context
AI Summary XLNet model pre-trained on English language. It was introduced in the paper XLNet: Generalized Autoregressive Pretraining for Language Understanding by Yang et al. and first released in this repository. Disclaimer: The team releasing XLNet did not write a model card for this model so this model card has been written by the Hugging Face team. XLNet is a new unsupervised language representation learning method based on a novel generalized permutation language modeling objective. Additionally,...
Available Tags: mainCached Revisions:ceaa69c7bc5e512b5007106a7ccbb7daf24b2c79
Small Language Models (8)
Compact language models for text generation and reasoning.
-
microsoft/phi-2 Compact LLM for text generation and reasoning tasks
AI Summary Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value). When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-2 showcased a nearly state-of-the-art performance among models with less than 13 billion parameters. Our model hasn't been fine-tuned through...
Available Tags: mainCached Revisions:810d367871c1d460086d9f82db8696f2e0a0fcd0
-
microsoft/Phi-3.5-mini-instruct Latest Phi model with improved instruction following
AI Summary Phi-3.5-mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family and supports 128K token context length. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence an...
Available Tags: mainCached Revisions:2fe192450127e6a83f7441aef6e3ca586c338b77
-
Qwen/Qwen2.5-1.5B-Instruct Balanced small LLM for generating trading insights and summaries
AI Summary Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. - Significant improvements in instruction following, generating long texts (over 8K tokens)...
Available Tags: mainCached Revisions:989aa7980e4cf806f80c7fef2b1adb7bc71aa306
-
Qwen/Qwen2.5-0.5B-Instruct Ultra-lightweight chat model for simple agent interactions
AI Summary Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. - Significant improvements in instruction following, generating long texts (over 8K tokens)...
Available Tags: mainCached Revisions:7ae557604adf67be50417f59c2c2f167def9a775
-
microsoft/Phi-3-mini-4k-instruct Instruction-tuned small LLM for interactive agent responses
AI Summary The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model belongs to the Phi-3 family with the Mini version in two variants 4K and 128K which is the context length (in tokens) that it can support. The model has underwent a post-training process that incorporates both supervised...
Available Tags: mainCached Revisions:f39ac1d28e925b323eae81227eaba4464caced4e
-
Qwen/Qwen2.5-3B-Instruct Capable small LLM for complex reasoning and analysis
AI Summary Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. - Significant improvements in instruction following, generating long texts (over 8K tokens)...
Available Tags: mainCached Revisions:aa8e72537993ba99e69dfaafa59ed015b17504d1
-
stabilityai/stablelm-2-1_6b Stability AI's efficient LLM for text generation tasks
AI Summary Please note: For commercial use, please refer to https://stability.ai/license is a 1.6 billion parameter decoder-only language model pre-trained on 2 trillion tokens of diverse multilingual and code datasets for two epochs. Get started generating text with by using the following code snippet: Developed by: Stability AI Model type: models are auto-regressive language models based on the transformer decoder architecture. Language(s): English Paper: Stable LM 2 1.6B Technical Report Library:...
Available Tags: mainCached Revisions:f499ead74c53749bd93cebc6ce8bc0d7bdf1eaef
-
TinyLlama/TinyLlama-1.1B-Chat-v1.0 Compact Llama-style model for lightweight chat applications
AI Summary The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. With some proper optimization, we can achieve this within a span of "just" 90 days using 16 A100-40G GPUs. The training has started on 2023-09-01. We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multit...
Available Tags: mainCached Revisions:fe8a4ea1ffedaf415f4da2f062534de366a451e6
Code Generation (3)
Understand and generate code for automated strategy development.
-
Salesforce/codegen-350M-mono Generate Python code for automated trading strategy development
AI Summary CodeGen is a family of autoregressive language models for program synthesis from the paper: A Conversational Paradigm for Program Synthesis by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong. The models are originally released in this repository, under 3 pre-training data variants (, , ) and 4 model size variants (, , , ). The checkpoint included in this repository is denoted as CodeGen-Mono 350M in the paper, where "Mono" means the mo...
Available Tags: mainCached Revisions:d9107f71cca463240db1143f4a75a927a27fcb27
-
microsoft/codebert-base Code understanding and search for strategy analysis
AI Summary CodeBERT-base Pretrained weights for CodeBERT: A Pre-Trained Model for Programming and Natural Languages. Training Data The model is trained on bi-modal data (documents & code) of CodeSearchNet Training Objective This model is initialized with Roberta-base and trained with MLM+RTD objective (cf. the paper). Reference 1. CodeBERT trained with Masked LM objective (suitable for code completion) 2. Hugging Face's CodeBERTa (small size, 6 layers)
Available Tags: mainCached Revisions:3b0952feddeffad0063f274080e3c23d75e7eb39
-
Qwen/Qwen2.5-Coder-1.5B-Instruct Instruction-tuned code model for generating trading algorithms
AI Summary Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in code generation, code reasoning and code fixing. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including...
Available Tags: mainCached Revisions:2e1fd397ee46e1388853d2af2c993145b0f1098a
Multilingual Models (4)
Models that work across multiple languages.
-
FacebookAI/xlm-roberta-base Multilingual language model for cross-language text understanding
AI Summary XLM-RoBERTa model pre-trained on 2.5TB of filtered CommonCrawl data containing 100 languages. It was introduced in the paper Unsupervised Cross-lingual Representation Learning at Scale by Conneau et al. and first released in this repository. Disclaimer: The team releasing XLM-RoBERTa did not write a model card for this model so this model card has been written by the Hugging Face team. XLM-RoBERTa is a multilingual version of RoBERTa. It is pre-trained on 2.5TB of filtered CommonCrawl data...
Available Tags: mainCached Revisions:e73636d4f797dec63c3081bb6ed5c7b0bb3f2089
-
microsoft/mdeberta-v3-base Multilingual DeBERTa for international text processing
AI Summary DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing DeBERTa improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data. In DeBERTa V3, we further improved the efficiency of DeBERTa using ELECTRA-Style pre-training with Gradient Disentangled Embedding Sharing. Compared to DeBERTa, our V3 version...
Available Tags: mainCached Revisions:a0484667b22365f84929a935b5e50a51f71f159d
-
FacebookAI/xlm-roberta-large Large multilingual model for high-quality cross-language NLP
AI Summary XLM-RoBERTa model pre-trained on 2.5TB of filtered CommonCrawl data containing 100 languages. It was introduced in the paper Unsupervised Cross-lingual Representation Learning at Scale by Conneau et al. and first released in this repository. Disclaimer: The team releasing XLM-RoBERTa did not write a model card for this model so this model card has been written by the Hugging Face team. XLM-RoBERTa is a multilingual version of RoBERTa. It is pre-trained on 2.5TB of filtered CommonCrawl data...
Available Tags: mainCached Revisions:c23d21b0620b635a76227c604d44e43a9f0ee389
-
google/bigbird-roberta-base Long-context model for processing lengthy financial documents
AI Summary BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle. It is a pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. Disclaimer: The team releasing BigBird did not write a model...
Available Tags: mainCached Revisions:5a145f7852cba9bd431386a58137bf8a29903b90