Giuseppe Bilotta<p>Even now, Thrust as a dependency is one of the main reason why we have a <a href="https://fediscience.org/tags/CUDA" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CUDA</span></a> backend, a <a href="https://fediscience.org/tags/HIP" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HIP</span></a> / <a href="https://fediscience.org/tags/ROCm" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ROCm</span></a> backend and a pure <a href="https://fediscience.org/tags/CPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CPU</span></a> backend in <a href="https://fediscience.org/tags/GPUSPH" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GPUSPH</span></a>, but not a <a href="https://fediscience.org/tags/SYCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SYCL</span></a> or <a href="https://fediscience.org/tags/OneAPI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OneAPI</span></a> backend (which would allow us to extend hardware support to <a href="https://fediscience.org/tags/Intel" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Intel</span></a> GPUs). <<a href="https://doi.org/10.1002/cpe.8313" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">doi.org/10.1002/cpe.8313</span><span class="invisible"></span></a>></p><p>This is also one of the reason why we implemented our own <a href="https://fediscience.org/tags/BLAS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BLAS</span></a> routines when we introduced the semi-implicit integrator. A side-effect of this choice is that it allowed us to develop the improved <a href="https://fediscience.org/tags/BiCGSTAB" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BiCGSTAB</span></a> that I've had the opportunity to mention before <<a href="https://doi.org/10.1016/j.jcp.2022.111413" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">doi.org/10.1016/j.jcp.2022.111</span><span class="invisible">413</span></a>>. Sometimes I do wonder if it would be appropriate to “excorporate” it into its own library for general use, since it's something that would benefit others. OTOH, this one was developed specifically for GPUSPH and it's tightly integrated with the rest of it (including its support for multi-GPU), and refactoring to turn it into a library like cuBLAS is</p><p>a. too much effort<br>b. probably not worth it.</p><p>Again, following <span class="h-card" translate="no"><a href="https://peoplemaking.games/@eniko" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>eniko</span></a></span>'s original thread, it's really not that hard to roll your own, and probably less time consuming than trying to wrangle your way through an API that may or may not fit your needs.</p><p>6/</p>