veganism.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
Veganism Social is a welcoming space on the internet for vegans to connect and engage with the broader decentralized social media community.

Administered by:

Server stats:

271
active users

#blas

0 posts0 participants0 posts today
Giuseppe Bilotta<p>Even now, Thrust as a dependency is one of the main reason why we have a <a href="https://fediscience.org/tags/CUDA" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CUDA</span></a> backend, a <a href="https://fediscience.org/tags/HIP" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HIP</span></a> / <a href="https://fediscience.org/tags/ROCm" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ROCm</span></a> backend and a pure <a href="https://fediscience.org/tags/CPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CPU</span></a> backend in <a href="https://fediscience.org/tags/GPUSPH" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GPUSPH</span></a>, but not a <a href="https://fediscience.org/tags/SYCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SYCL</span></a> or <a href="https://fediscience.org/tags/OneAPI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OneAPI</span></a> backend (which would allow us to extend hardware support to <a href="https://fediscience.org/tags/Intel" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Intel</span></a> GPUs). &lt;<a href="https://doi.org/10.1002/cpe.8313" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">doi.org/10.1002/cpe.8313</span><span class="invisible"></span></a>&gt;</p><p>This is also one of the reason why we implemented our own <a href="https://fediscience.org/tags/BLAS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BLAS</span></a> routines when we introduced the semi-implicit integrator. A side-effect of this choice is that it allowed us to develop the improved <a href="https://fediscience.org/tags/BiCGSTAB" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BiCGSTAB</span></a> that I've had the opportunity to mention before &lt;<a href="https://doi.org/10.1016/j.jcp.2022.111413" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">doi.org/10.1016/j.jcp.2022.111</span><span class="invisible">413</span></a>&gt;. Sometimes I do wonder if it would be appropriate to “excorporate” it into its own library for general use, since it's something that would benefit others. OTOH, this one was developed specifically for GPUSPH and it's tightly integrated with the rest of it (including its support for multi-GPU), and refactoring to turn it into a library like cuBLAS is</p><p>a. too much effort<br>b. probably not worth it.</p><p>Again, following <span class="h-card" translate="no"><a href="https://peoplemaking.games/@eniko" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>eniko</span></a></span>'s original thread, it's really not that hard to roll your own, and probably less time consuming than trying to wrangle your way through an API that may or may not fit your needs.</p><p>6/</p>
FCLC<p>Simple question: what is your *default* BLAS package? <br><a href="https://mast.hpc.social/tags/HPC" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HPC</span></a> <a href="https://mast.hpc.social/tags/BLAS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BLAS</span></a></p>
FCLC<p>What does this mean? It means that we now have a dedicated Matrix ASIC that can be used via standard opcodes/compilers, available to anyone with a relevant toolchain and compiler. </p><p>For the most part, expect all of your <a href="https://mast.hpc.social/tags/BLAS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BLAS</span></a> kernels to gain support over time!</p><p>For <a href="https://mast.hpc.social/tags/HPC" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HPC</span></a> in contrast with most matrix tile implementations, we have spec mandated single and double precision support.</p><p>That's in contrast with the x86 AMX extensions, most consumer dGPU implementations etc. which are 19 bits and below.</p>
FCLC<p>Time for an <a href="https://mast.hpc.social/tags/introduction" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>introduction</span></a>! <br>I'm a young Canuck with interests/experience in <a href="https://mast.hpc.social/tags/HPC" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HPC</span></a>, <a href="https://mast.hpc.social/tags/Linux" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Linux</span></a>, <a href="https://mast.hpc.social/tags/BLAS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BLAS</span></a>, <a href="https://mast.hpc.social/tags/SYCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SYCL</span></a>, <a href="https://mast.hpc.social/tags/C" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>C</span></a>, <a href="https://mast.hpc.social/tags/AVX512" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AVX512</span></a>, <a href="https://mast.hpc.social/tags/Rust" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Rust</span></a>, heterogeneous compute &amp; other such things. </p><p>Currently my personal projects are bringing <a href="https://mast.hpc.social/tags/FP16" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>FP16</span></a> to the <a href="https://mast.hpc.social/tags/OpenBLAS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenBLAS</span></a> library, working to standardize what Complex domain BLAS FP16 kernels/implementations should look like, and making sure <a href="https://mast.hpc.social/tags/SYCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SYCL</span></a> is available everywhere. </p><p>I also write every now and again. Here's the tail of AVX512 FP16 on Alderlake <br><a href="https://gist.github.com/FCLC/56e4b3f4a4d98cfd274d1430fabb9458" rel="nofollow noopener" target="_blank"><span class="invisible">https://</span><span class="ellipsis">gist.github.com/FCLC/56e4b3f4a</span><span class="invisible">4d98cfd274d1430fabb9458</span></a></p>