Ortho-Hydra: Advanced Experts for DiT LoRA Fine-Tuning

Ortho-Hydra: Orthogonalized Experts for DiT LoRA

Recent advancements in the field of deep learning have introduced significant challenges in the fine-tuning of diffusion transformers (DiT) on multi-style datasets. A recurrent issue, referred to as style bleed, occurs when a single low-rank residual fails to encapsulate the distinct characteristics of multiple artist fingerprints, leading to an optimizer that converges towards an average representation. To address this, researchers have explored the implementation of a mixture-of-experts approach in LoRA, particularly through the HydraLoRA style.

Understanding the Limitations of Existing Approaches

The HydraLoRA method enhances the up-projection process by integrating multiple heads under a routing mechanism. However, when these experts are initialized to zero, the router receives identical gradients from each head, causing it to remain anchored to a uniform prior. As a result, the experts evolve in a permutation-symmetric manner, effectively reducing the training process to that of a single rank-$r$ LoRA at a cost that is E times greater.

Introducing Ortho-Hydra

To overcome these limitations, we propose Ortho-Hydra, a novel re-parameterization strategy that combines an orthogonal shared basis with disjoint output subspaces tailored for each expert. This approach utilizes a Cayley-orthogonal framework derived from the top-(Er) left singular vectors of the pretrained weights. The introduction of disjointness ensures that the router’s per-expert score is non-degenerate from the outset, allowing for effective gradient signals to be transmitted to individual experts even before they have undergone training.

Experimental Validation

To validate the efficacy of Ortho-Hydra, we conducted a series of experiments comparing it against two HydraLoRA baselines: a zero-initialized shared-basis variant and the original Gaussian-jitter mitigation approach with σ = 0.1. All experiments were executed under identical optimization conditions, datasets, and step budgets.

Neither baseline demonstrated any deviation from the uniform prior within the first 1,000 steps.
In contrast, Ortho-Hydra exhibited significant divergence from the uniform prior within the initial few hundred steps.

This early de-uniformization signifies a promising shift towards improved specialization among experts, thereby enhancing the overall performance of the DiT pipeline.

Future Directions

While our primary focus has been on the construction, cold-start mechanism, and routing dynamics altered by Ortho-Hydra, we acknowledge that the end-task generation quality on multi-style data remains an area for future exploration. As the field continues to evolve, it is imperative to refine these methodologies to fully leverage the potential of diffusion transformers in diverse applications.

Accessing the Code

The implementation of Ortho-Hydra is available on GitHub for further research and exploration. Interested parties can access the code at the following link: https://github.com/sorryhyun/anima_lora.

As we advance our understanding of multi-style data representation in machine learning, Ortho-Hydra stands as a pivotal step towards overcoming the challenges posed by style bleed, promising more specialized and effective models in the realm of deep learning.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Ortho-Hydra: Advanced Experts for DiT LoRA Fine-Tuning

Ortho-Hydra: Orthogonalized Experts for DiT LoRA

Understanding the Limitations of Existing Approaches

Introducing Ortho-Hydra

Experimental Validation

Future Directions

Accessing the Code

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related