Ubuntu HowTo: No module named azure.storage.blob – Azure Linux VM (ubuntu 18.04)

Original Source Link

I have created a Linux VM (ubuntu 18.04) in Azure .Installed Python3.6 on that using sudo apt install python3.7. Python programs are running fine. Now tried to install azure-storage-blob. could not find any apt-get package. I tried with pip3 – first installed pip3 and then sudo pip3 install azure-storage-blob. It’s installed successfully. Now tried to run the folllowing simple code

import os, uuid
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

try:
    print("Azure Blob storage v12 - Python quickstart sample")
    # Quick start code goes here
except Exception as ex:
    print('Exception:')
    print(ex)

Getting Error:

Traceback (most recent call last):
  File "basicblob1.py", line 2, in <module>
    from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
ImportError: No module named azure.storage.blob

Tagged : /

Math Genius: What is correct ranking for Spearman Correlation?

Original Source Link

In order to calculate Spearman Correlation Coefficient, the data should be ranked. However, many people do this in different way. Some sort them like an increasing sequence (i.e the smallest number has rank 1 and the greatest has rank $n$), others do this in an opposite way, they give the highest rank to the smallest number and rank 1 to the greatest. Can you suggest what is the most appropriate way to do that?

Fake data simulated in R for purposes of demonstration.

set.seed(2020)
x = rnorm(15, 100, 15)
round(x,2)
 [1] 105.65 104.52  83.53  83.04  58.05 110.81 114.09  96.56
 [9] 126.39 101.76  87.20 113.64 117.95  94.43  98.15
y = .001*x^4 + rnorm(15, 0, 4)
round(y)
 [1] 124617 119365  48669  47550  11357 150771 169415  86933
 [9] 255158 107233  57828 166771 193524  79500  92807

A scatterplot shows positive, but not entirely linear,
association.

plot(x,y, pch=20)

enter image description here

Notice that Pearson and Spearman correlation differ.
Roughly speaking, Pearson correlation measures the
linear component of the association. The Pearson
correlation $r = 0.948$ shows substantial, but not
perfect, linear association.

By contrast, each increase in $x$ is accompanied by
an increase in $y.$ This leads to a Spearman correlation
$r_S = 1.$

cor(x,y, method="pearson")
[1] 0.9481193
cor(x,y, method="spearman")
[1] 1

As you say, the Spearman correlation is based on ranks.
Notice that $x$‘s and $y$‘s have ranks that match exactly. This is another way of saying that each increase in $x$ is accompanied by an increase in $y.$

Notice that rank 1 for the $x$‘s corresponds to the minimum $x$-value 58.05, and rank 1 for the $y$‘s corresponds to the minimum $y$-value 11,357. Similarly, rank 15 corresponds to the maximum of each variable.

rank(x)
 [1] 10  9  3  2  1 11 13  6 15  8  4 12 14  5  7
rank(y)
 [1] 10  9  3  2  1 11 13  6 15  8  4 12 14  5  7

The Spearman correlation can be found by taking the
Pearson correlation of the ranks.

cor(rank(x), rank(y), method = "pearson")
[1] 1

The Wikipedia article of Spearman correlation has some nice examples.

Tagged : /

Server Bug Fix: SR-IOV not working on Ubuntu 20.04 (hyper-v), works fine on 18.04 LTS

Original Source Link

I’m running Ubuntu server virtual machines on a Windows Server 2019 system using Hyper-V.
The system is equipped with an Intel I350 network adapter. (The Ubuntu VM’s recognize it as “Ethernet controller: Intel Corporation I350 Virtual Function (rev 01)” )

The Ubuntu 18.04 machines are using SR-IOV out of the box.
These are using the kernel: 4.15.0-99-generic #100-Ubuntu SMP Wed Apr 22 20:32:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

When I upgraded these 18.04 machines to the latest HWE kernel, SR-IOV stopped working.
Hyper-V reports the network adapters state to be degraded (SR-IOV not operational)
I submitted a bug on launchpad about this issue, over a year ago, but never received an answer (https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1818400)

With the new Ubuntu 20.04 release I’m having the same problem. SR-IOV is not working out of the box on these new Ubuntu VM either.

Does someone here know how to get SR-IOV working on newer kernels? Or how to raise this bug with the Ubuntu developers so that it gets their attention?

What I’ve already tried:
I’ve compared the kernel modules loaded between the 18.04 and 20.04 machines.
On the 20.04 machine I enabled all the missing modules from the 18.04, inside the /etc/modules file:

  • ib_cm
  • ib_core
  • ib_iser
  • iscsi_tcp
  • iw_cm
  • rdma_cm
  • libiscsi
  • libiscsi_tcp
  • pcbc

I noticed these three modules could not be loaded on 20.04:

  • pps_core
  • ptp
  • aes_x86_64
Tagged : / / /

Math Genius: Solve $sum_{i=1}^n a_iexp(-b_ix) = 1$

Original Source Link

Problem. Solve $displaystyle sum_{i=1}^n a_iexp(-b_ix) = 1$ where $n$ is a positive integer and $a_i, b_i (i=1,2,dots,n)$ are positive constants.

I am not sure this can be solved analytically. The LHS is strictly decreasing and continuous from $+infty$ to $-infty$ so the solution exists and is unique.

Assuming that the $b_i$‘s are rational numbers of irreducible denominators $d_i$, if we set

$$t:=expleft(-frac x{text{lcm}(d_i)}right),$$

we get the polynomial equation

$$sum_{i=0}^n a_i t^{n_i}-1=0$$ where $n_i=b_itext{lcm}(d_i)$.

Hence this proves that the equation does not have an analytical solution in the general case.


If you consider the logarithm of the LHS (and look for its zeroes), it has two oblique asymptotes. For large positive $x$, it reduces to $log a_{min}-b_{min}x$ and for large negative $x$, to $log a_{max}-b_{max}x$ (the $min$ and $max$ indexes refer to $b$). This gives you an approximation of the root at the crossing point.

enter image description here

Tagged : / / /

Ubuntu HowTo: 20.04 Frozen on Boot Screen

Original Source Link

I recently dual-booted and installed Ubuntu 20.04 on my Thinkpad X1 Extreme Gen 1 alongside Windows 10 and the installation was working perfectly. The Nvidia proprietary driver was installed and I had no issues. I shutdown and upon trying to get back in to Ubuntu, it froze. I can get to the GRUB menu and successfully get back in to Windows 10. If I instead select Ubuntu from GRUB, it shows the Lenovo logo with the Ubuntu spin-loader. The spin loader disappears and it freezes there, so I never see the login screen. I have tried using nomodeset to get back in but it still freezes. I have also gone to the recovery mode and used dpkg to repair broken packages and update grub bootloader. Nothing has worked. This happened previously and I opted to delete the partition and reinstall, but I am facing the same problem with a fresh install. BIOS is updated and secure boot is off. What might the problem be?

Tagged : / / / /

Math Genius: Solve $sum_{i=1}^n a_iexp(-b_ix) = 1$

Original Source Link

Problem. Solve $displaystyle sum_{i=1}^n a_iexp(-b_ix) = 1$ where $n$ is a positive integer and $a_i, b_i (i=1,2,dots,n)$ are positive constants.

I am not sure this can be solved analytically. The LHS is strictly decreasing and continuous from $+infty$ to $-infty$ so the solution exists and is unique.

Assuming that the $b_i$‘s are rational numbers of irreducible denominators $d_i$, if we set

$$t:=expleft(-frac x{text{lcm}(d_i)}right),$$

we get the polynomial equation

$$sum_{i=0}^n a_i t^{n_i}-1=0$$ where $n_i=b_itext{lcm}(d_i)$.

Hence this proves that the equation does not have an analytical solution in the general case.


If you consider the logarithm of the LHS (and look for its zeroes), it has two oblique asymptotes. For large positive $x$, it reduces to $log a_{min}-b_{min}x$ and for large negative $x$, to $log a_{max}-b_{max}x$ (the $min$ and $max$ indexes refer to $b$). This gives you an approximation of the root at the crossing point.

enter image description here

Tagged : / / /

Code Bug Fix: Why are normal quantiles used for sample size calculations instead of the t-distribution?

Original Source Link

The sample size formula for a one-sample t-test is often given as:

$$ n = frac{(z_{1-alpha/2} + z_{1-beta})^2 sigma^2} {Delta^2} $$

Meanwhile, G*Power appears to use the t-distribution, which gives larger $n$‘s because of the heavier-tailed t-distribution. This strikes me as more accurate since we’ll be using the t-distribution for testing. Is using the normal quantiles to calculate sample size then underestimating the necessary sample size for a given power and type-I error?

A typical way to use the sample size from the normal assumption would be to approximate the number of degrees of freedom in the t-distribution. Say your sample size calculation, using the normal distribution, gives a required sample size of 60. You would then use $n=60$ and $df=59$ in the sample size calculation with the $t_{df}$ distribution.

Yes, you underestimate the required sample size when you use the normal distribution, which underpowers the test. Whether or not this loss of power is worth going through the extra work of estimating the degrees of freedom and then redoing the calculation (a process you might iterate several times) will depend on the problem and the people working on the problem.

Tagged : / /

Server Bug Fix: qemu/KVM optimisation for heavy TPS and write workloads

Original Source Link

I am running 30 VMs on RAID 0 SSDs.

The VM workloads are a heavy Docker environment (docker-compose to be precise, running around ~40 containers).

CPU load average and RAM usage are both well within comfortable parameters on the server. Disk Utilization however is massive, and iostat shows that the TPS and MB_wrtn figures are where the work is being done:

Device:  tps       MB_read/s   MB_wrtn/s   MB_read   MB_wrtn
md5      2825.57   3.28        28.15       1673116   14379843

Currently I’m defining my VM disks as follows:

<driver name='qemu' type='raw' cache='none' io='threads'/>
<source file='os.img' aio='native'/>

My VM host is using kernel 3.10.0-1062.18.1.el7.x86_64 on CentOS Linux 7 (Core) and, as a result, the deadline scheduler. The guests are using a much newer kernel and have defaulted to the mq-deadline scheduler.

I’m struggling to find any real information about optimisation, and lots of conflicting advice about which caching/io strategies to use.

This is really difficult to benchmark – the heaviest part of the workload can take 2-3 hours to kick in, and it’s only for about a ~30 minute period that disk utilization gets hammered – but this is a crucial part of the work, and it is causing major slowness in comparison to when the disk utilisation is lower.

My questions are therefore:

  • What combination of cache, io and aio will offer the best performance for a high TPS/write workload?
  • Should I use iothreads given that I have “spare” CPU resource?

In addition, specifically related to my host’s kernel version:

  • Should I upgrade to kernel version 3.17+ to access blk-mq (block multi-queue)?
  • If so, how do I enable this in my QEMU/KVM setup/definitions?
  • What’s the best guest scheduler to use with blk-mq – is it just none?

I will be awarding a bounty for this question as soon as I am able.

=== EDIT ===

  • I have updated the iostat output above after a full, usual workload
  • We’re using 4x 2TB Samsung PM883 SSDs in a RAID0 array (software RAID)
  • Added some benchmarking stats below:

fio / ioping from the host’s RAID0 array

Jobs: 1 (f=1): [m(1)][100.0%][r=421MiB/s,w=139MiB/s][r=108k,w=35.7k IOPS][eta 00m:00s]
---
9 requests completed in 1.65 ms, 36 KiB read, 5.45 k iops, 21.3 MiB/s

fio / ioping from the host’s RAID1 OS array

Jobs: 1 (f=1): [m(1)][100.0%][r=263MiB/s,w=86.7MiB/s][r=67.3k,w=22.2k IOPS][eta 00m:00s]
---
9 requests completed in 1.50 ms, 36 KiB read, 6.00 k iops, 23.5 MiB/s

fio from a VM’s vda device

Jobs: 1 (f=1): [m(1)] [100.0% done] [246.2MB/82458KB/0KB /s] [62.1K/20.7K/0 iops] [eta 00m:00s]

I have tried tuning each VM to a maximum of 2300 read/800 write iops – but, for some bizarre reason, this makes things significantly worse. I encounter a much higher number of timeouts and job failures.

Here’s what Grafana looks like during the middle of a workload:

grafana dashboard

Probably the root of the problem is that you have SSDs with a SATA interface. SATA has only one command queue, i.e. multiple commands on the SATA bus cannot be in flight in parallel. SAS improves this a bit, but it really got massively better with NVMe, which has up to 64k independent queues.

Increasing the queue depth in SATA increases the latency of single operations (while increasing the thoughput). This is probably what happened when you increased VM IOPS.

I wouldn’t expect a huge gain for a SATA SSD with blk-mq. Same goes for more iothreads. They can’t work around contention on the SATA interface.

Although this is probably not what you are looking for, I guess your best option is a hardware upgrade to NVMe based SSDs. Scheduler tuning cannot work around hardware limitations.

Tagged : / /

Math Genius: Elements in $K_0(A)$

Original Source Link

Let A be a $C^*$-algebra, unital or not.

  1. I want to show that each element in $K_0(A)$ is of the form

$$[p]_0 – bigg[ begin{pmatrix}
1_n & 0_n \
0_n & 0_n \
end{pmatrix} bigg]_0$$

for some projection $p in M_{2n}(tilde A)$ satisfying the following which I will call (A):

$$p – begin{pmatrix}
1_n & 0_n \
0_n & 0_n \
end{pmatrix} in M_{2n}(A)$$

  1. And I want to show that an element $p$ in $M_{2n}(tilde A)$ satisfies (A) if and only if $s(p)=diag(1_n , 0_n)$.

Idea:

  1. By definition $K_0(A)= lbrace [p]_0 – [s(p)]_o : p in mathcal{P}_infty (tilde A) rbrace$ where as far as I can tell $s(a +alpha 1)= alpha 1$ for all $a in A$ and all $alpha in mathbb{C}$. My book says that “the image of $s_n$ is the subset $M_n(mathbb{C}$ of $M_n(tilde A)$ consisting of all matrices with scalar entries, and $x-s_n (x)$ belongs to $M_n(A)$ for all x in $M_n(tilde A)$” so my question is what exactly this means. Does this mean that:

$$s(p)= begin{pmatrix}
1_n & 0_n \
0_n & 0_n \
end{pmatrix}$$

As this seems too easy I don’t think it is true. Or is there another way to show this?

  1. I think that the $Leftarrow$ should follow from the passage in my book, but I am not quite sure..
Tagged :

Math Genius: Prove that there exists $cin[0,1]$ such that $int_0^cf(t)dt=f(c)^3.$

Original Source Link

Question: Let $f:[0,1]tomathbb{R}$ be a continuous function with $int_0^1f(t)dt=0$. Prove that there exists $cin[0,1]$ such that $$int_0^cf(t)dt=f(c)^3.$$

Solution: Let $g:[0,1]tomathbb{R}$ be such that$$g(x)=int_0^xf(t)dt-f(x)^3, forall xin[0,1].$$

Now since $f$ is continuous $forall xin[0,1]$, thus, by the first fundamental theorem of calculus, we can conclude that $g$ is continuous $forall xin[0,1]$.

Thereafter, observe that $g(x)=0$ for some $xin[0,1]iff int_0^xf(t)dt=f(x)^3$ for some $xin[0,1]$. Hence, to prove the statement of the problem it is sufficient to show that $g(c)=0$ for some $cin[0,1]$.

Now $g(0)=-f(0)^3$ and $g(1)=-f(1)^3$.

Observe that if $f(0)$ and $f(1)$ are of different signs, then $g(0)$ and $g(1)$ are also of different signs, in which case, by IVT we can conclude that $exists cin(0,1)subset[0,1],$ such that $g(c)=0$. Hence, we are done in this case.

Again, if $f(0)=0$ or $f(1)=0$, then at least one of $g(0)$ and $g(1)=0$, in which case we are done.

Now, we are left with the case that both $f(0)$ and $f(1)$ are of the same sign. Thus, let us assume WLOG that $f(0)>0$ and $f(1)>0$. Hence, $g(0)<0$ and $g(1)<0$. Now since $int_0^1f(t)dt=0$ and $f(0),f(1)>0$, implies that $exists$ at least two points $a,bin(0,1)$, such that $b>a$ satisfying $f(a)=f(b)=0$. Thus, we can conclude that $exists c_1in(0,1),$ such that $f(x)>0, forall xin[0,c_1)$ and $f(c_1)=0$. Hence, we have $$g(c_1)=int_0^{c_1}f(t)dt-f(c_1)^3=int_0^{c_1}f(t)dt>0.$$ Thus, we have $g(c_1)>0$ and $g(1)<0$, which implies that, by IVT we can conclude that $exists cin(c_1,1)subset[0,1]$, such that $g(c)=0$. Hence, we are done in this case too.

Hence, we are done with all the cases and in each case we have shown that $exists cin[0,1]$ such that $g(c)=0$. Thus, we are done.

Is this solution correct and rigorous enough? If yes, is there any alternative solution?

Appears correct and mostly rigorous to me, also clear and not too long. A proof by IVT is a valid idea. Two points:

  1. Are you sure that you’re applying the second FTOC, not the first FTOC?
  2. You say that there exists $0<c_1<1$ such that $f(x)>0$ for all $0leq x < c_1$ and $f(c_1)=0$. How do you know the $f(x) > 0$ for all $0leq x < c_1$ part holds?

For part 2., you need to essentially show that $f$ has a smallest positive zero (assuming, for instance, that $f(0)>0$). Can you do it?


As a final note, the proof you gave allows one to slightly generalise the result. Namely, you can use any continuous $h: [0, 1] to mathbb{R}$ that preserves sign at $f(0), f(1)$ with $h(0) = 0$ instead of the cube function, i.e. instead of $f(c)^3$ you could put $h(f(c))$ without any trouble. Here is a slightly altered proof which, similarly to your proof, works for this generalised case (with details to be filled in by reader). Sign-preserving of $h$ is irrelevant at $f(1)$ for this proof, hence may be omitted.

Proof. For concreteness, let $f(0) > 0$. Let $x_1$ be the smallest positive zero of $q(x) := intlimits_{0}^{x}f(t),mathrm{d}t$. We may assume $f(x_1) < 0$. Then by IVT at some point $z$ in $(0, x_1)$ it is the case that $f(z) = 0$ and $q(z) > 0$. Therefore, $g := q – h(f)$ will have changed sign in $(0, z)$, completing the proof.

Tagged : /