Veganism Social

Giuseppe BilottaEven now, Thrust as a dependency is one of the main reason why we have a <a href="https://fediscience.org/tags/CUDA" class="mention hashtag" rel="nofollow noopener" target="_blank">#CUDA</a> backend, a <a href="https://fediscience.org/tags/HIP" class="mention hashtag" rel="nofollow noopener" target="_blank">#HIP</a> / <a href="https://fediscience.org/tags/ROCm" class="mention hashtag" rel="nofollow noopener" target="_blank">#ROCm</a> backend and a pure <a href="https://fediscience.org/tags/CPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#CPU</a> backend in <a href="https://fediscience.org/tags/GPUSPH" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPUSPH</a>, but not a <a href="https://fediscience.org/tags/SYCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#SYCL</a> or <a href="https://fediscience.org/tags/OneAPI" class="mention hashtag" rel="nofollow noopener" target="_blank">#OneAPI</a> backend (which would allow us to extend hardware support to <a href="https://fediscience.org/tags/Intel" class="mention hashtag" rel="nofollow noopener" target="_blank">#Intel</a> GPUs). <<a href="https://doi.org/10.1002/cpe.8313" rel="nofollow noopener" translate="no" target="_blank">https://doi.org/10.1002/cpe.8313</a>>This is also one of the reason why we implemented our own <a href="https://fediscience.org/tags/BLAS" class="mention hashtag" rel="nofollow noopener" target="_blank">#BLAS</a> routines when we introduced the semi-implicit integrator. A side-effect of this choice is that it allowed us to develop the improved <a href="https://fediscience.org/tags/BiCGSTAB" class="mention hashtag" rel="nofollow noopener" target="_blank">#BiCGSTAB</a> that I've had the opportunity to mention before <<a href="https://doi.org/10.1016/j.jcp.2022.111413" rel="nofollow noopener" translate="no" target="_blank">https://doi.org/10.1016/j.jcp.2022.111413</a>>. Sometimes I do wonder if it would be appropriate to “excorporate” it into its own library for general use, since it's something that would benefit others. OTOH, this one was developed specifically for GPUSPH and it's tightly integrated with the rest of it (including its support for multi-GPU), and refactoring to turn it into a library like cuBLAS isa. too much effort b. probably not worth it.Again, following <a href="https://peoplemaking.games/@eniko" class="u-url mention" rel="nofollow noopener" target="_blank">@eniko</a>'s original thread, it's really not that hard to roll your own, and probably less time consuming than trying to wrangle your way through an API that may or may not fit your needs.6/

FCLCSimple question: what is your *default* BLAS package? <a href="https://mast.hpc.social/tags/HPC" class="mention hashtag" rel="nofollow noopener" target="_blank">#HPC</a> <a href="https://mast.hpc.social/tags/BLAS" class="mention hashtag" rel="nofollow noopener" target="_blank">#BLAS</a>

FCLCWhat does this mean? It means that we now have a dedicated Matrix ASIC that can be used via standard opcodes/compilers, available to anyone with a relevant toolchain and compiler. For the most part, expect all of your <a href="https://mast.hpc.social/tags/BLAS" class="mention hashtag" rel="nofollow noopener" target="_blank">#BLAS</a> kernels to gain support over time!For <a href="https://mast.hpc.social/tags/HPC" class="mention hashtag" rel="nofollow noopener" target="_blank">#HPC</a> in contrast with most matrix tile implementations, we have spec mandated single and double precision support.That's in contrast with the x86 AMX extensions, most consumer dGPU implementations etc. which are 19 bits and below.

FCLCTime for an <a href="https://mast.hpc.social/tags/introduction" class="mention hashtag" rel="nofollow noopener" target="_blank">#introduction</a>! I'm a young Canuck with interests/experience in <a href="https://mast.hpc.social/tags/HPC" class="mention hashtag" rel="nofollow noopener" target="_blank">#HPC</a>, <a href="https://mast.hpc.social/tags/Linux" class="mention hashtag" rel="nofollow noopener" target="_blank">#Linux</a>, <a href="https://mast.hpc.social/tags/BLAS" class="mention hashtag" rel="nofollow noopener" target="_blank">#BLAS</a>, <a href="https://mast.hpc.social/tags/SYCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#SYCL</a>, <a href="https://mast.hpc.social/tags/C" class="mention hashtag" rel="nofollow noopener" target="_blank">#C</a>, <a href="https://mast.hpc.social/tags/AVX512" class="mention hashtag" rel="nofollow noopener" target="_blank">#AVX512</a>, <a href="https://mast.hpc.social/tags/Rust" class="mention hashtag" rel="nofollow noopener" target="_blank">#Rust</a>, heterogeneous compute & other such things. Currently my personal projects are bringing <a href="https://mast.hpc.social/tags/FP16" class="mention hashtag" rel="nofollow noopener" target="_blank">#FP16</a> to the <a href="https://mast.hpc.social/tags/OpenBLAS" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenBLAS</a> library, working to standardize what Complex domain BLAS FP16 kernels/implementations should look like, and making sure <a href="https://mast.hpc.social/tags/SYCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#SYCL</a> is available everywhere. I also write every now and again. Here's the tail of AVX512 FP16 on Alderlake <a href="https://gist.github.com/FCLC/56e4b3f4a4d98cfd274d1430fabb9458" rel="nofollow noopener" target="_blank">https://gist.github.com/FCLC/56e4b3f4a4d98cfd274d1430fabb9458</a>

Recent searches

Search options

Administered by:

Server stats:

#blas