Heron#
Welcome to “heron” repository. Heron is a library that seamlessly integrates multiple Vision and Language models, as well as Video and Language models. One of its standout features is its support for Japanese V&L models. Additionally, we provide pretrained weights trained on various datasets.
Please click here to see the multimodal demo pages built with different LLMs. (Both are available in Japanese)
Heron allows you to configure your own V&L models combining various modules. Vision Encoder, Adopter, and LLM can be configured in the configuration file. The distributed learning method and datasets used for training can also be easily configured.
Organization#
License#
Released under the Apache License 2.0.
Acknowledgements#
GenerativeImage2Text: The main idia of the model is based on original GIT.
Llava : This project is learned a lot from the great Llava project.