Heron#

[日本語] | [English] | [中文]

Welcome to “heron” repository. Heron is a library that seamlessly integrates multiple Vision and Language models, as well as Video and Language models. One of its standout features is its support for Japanese V&L models. Additionally, we provide pretrained weights trained on various datasets.

Please click here to see the multimodal demo pages built with different LLMs. (Both are available in Japanese)

BLIP + Japanese StableLM Base Alpha
GIT + ELYZA-japanese-Llama-2

Heron allows you to configure your own V&L models combining various modules. Vision Encoder, Adopter, and LLM can be configured in the configuration file. The distributed learning method and datasets used for training can also be easily configured.

Organization#

Turing株式会社

License#

Released under the Apache License 2.0.

Acknowledgements#

GenerativeImage2Text: The main idia of the model is based on original GIT.
Llava : This project is learned a lot from the great Llava project.
GIT-LLM

Contents

Installation
- 1. Clone this repository
- 2. Install Packages
Training
Evaluation
Datasets

Index#

Index
Module Index
Search Page