Code Bug Fix: XString breakpoint thrown when defining string

Original Source Link

Hello i’ve had this break point being thrown by xstring for the last two days and I can’t seem to get rid of it. First off the class i’m working with or at least the relevant bit.

class myProjects {
public:
    std::wstring wstr;
    const wchar_t* BrowserPtr = wstr.c_str();
    std::string browser = "undeffined";
};

with that out of the way here is whats generating the error

void myProjects::setBrowser(std::string &str) {
    std::string cpy = str;
    browser = str; // this is the problimatic line
    wchar_t temp[21];
    size_t outSize;
    mbstowcs_s(&outSize, temp, sizeof temp, str.c_str(), str.size());
    wstr.assign(temp);
}

at least that’s my goal the string definition alone still throws the error.
so here’s the error.

Exception thrown: read access violation. this was 0x14.

and the surrounding code snippet

    basic_string& assign(_In_reads_(_Count) const _Elem* const _Ptr, _CRT_GUARDOVERFLOW const size_type _Count) {
        // assign [_Ptr, _Ptr + _Count)
        if (_Count <= _Mypair._Myval2._Myres) {  //here
            _Elem* const _Old_ptr   = _Mypair._Myval2._Myptr();
            _Mypair._Myval2._Mysize = _Count;
            _Traits::move(_Old_ptr, _Ptr, _Count);
            _Traits::assign(_Old_ptr[_Count], _Elem());
            return *this;
        }

Now I’ve tried a lot to get rid of this and I know that if I watch the variable browser in the debugger it says its information is unreadable and calling things like get capacity throw errors on read access as well. I do use strings else where in the program and even change wstrings and string back and forth but this is a new error. I also tried making browser private and a getBrowser function that was this.

People seemed to think by unreadable I meant garbage so I attached a screenshot of the dubugger.enter image description here

std::string myProjects::getBrowser() {
    //std::string s(wstr.begin(), wstr.end());
    std::string newStr = "";
    return newStr;
}

with or without commented out line it throws same error. Further more putting the copy and pasted code snippet that generate the error into the error alone don’t in a new project. Im just looking for some insight as to why this , might be happening. I am not using pointers in either function and the 0x14 is weird but if I try to reserve space same error. Frankly i’m just really confused and don’t know what to do. I realize i didn’t give something to throw in a compiler and i’m sorry but I really have no idea where the error would be dirrvided from or what types of things cause this error all i could find online was null pointers.
sorry for typos.

A possible problem can be found in class myProjects:

class myProjects {
public:
    std::wstring wstr;
    const wchar_t* BrowserPtr = wstr.c_str();
    std::string browser;
};

When a variable of type myProjects is constructed, first wstr is constructed with an empty string, then BrowserPtr is initialized with a pointer to the data portion of the then empty string wstr.

However, as soon as you assign anything to wstr, it is quite likely that it frees the internal data buffer, and allocates a new one to hold the new string. This means that BrowserPtr is now pointing to a piece of memory that has been freed.

As PaulMcKenzie already mentions, don’t store BrowserPtr this way, omit it from the class. Instead, the moment you need to pass a C string to another function, call wstr.c_str().

Another issue is with the multibyte conversion:

wchar_t temp[500];
size_t outSize;
mbstowcs_s(&outSize, temp, size_t(256), str.c_str(), str.size());

First, why have an array of length 500 but pass 256 to mbstowcs_s()? It is better to ensure the values match:

mbstowcs_s(&outSize, temp, sizeof temp / sizeof *temp, str.c_str(), str.size());

Also note that if the input string is as large or larger than the length you pass to mbstowcs_s(), then there will be no null wide character written to the output. Either you have to check the result to ensure the result is shorter than the size of the output buffer, or you have to ensure you always add a null wide character at the end of the output buffer:

temp[(sizeof temp / sizeof *temp) - 1] = L'';

Of course, it’s even better to first call mbcstowcs(nullptr, str.c_str(), 0) to get the required length of the output buffer. Then it is possible in C++17 to resize wstr to the right length, and pass wstr.data() to mbcstowcs() as the output buffer.

Tagged : /

Code Bug Fix: Structure binding doesn’t work without type?

Original Source Link

I try to do something like this

pair<int, int> f() {
    return {1, 2};
}

int a, b;
[a, b] = f();

and get compilation error. All errors boil down to that fact, that the compiler think that it is a syntactically incorrect lambda. So if I use structured binding I always should write auto?

Yes, that’s just how structured bindings work. It introduces new identifiers, so they cannot be existing variables. So when using it, you must use auto, e.g.

auto [a, b] = f();   // can also be auto&, auto const, etc

If you want to use existing variables, you can use std::tie:

int a, b;
std::tie(a, b) = f();

Note that all the identifiers must be new, so you can’t mix and match these 2 approaches. e.g. you can’t use a structured binding with a single already existing variable.

Tagged : / /

Code Bug Fix: How to implement a blocking processing loop?

Original Source Link

I’d like to implement a processing loop in a worker thread so that it processes data from a queue when there’s something and blocks (the thread sleeps) otherwise…is this even possible? Should also work without any noticeable delays.

Something simple like this:

std::deque<Foo> queue;

void worker() 
{
    while (active) {
        blockAndWaitForData();

        while (!queue.empty()) {
            doSomething(queue.front());
            queue.pop_front();
        }
    }
}

Of course the queue would need to be locked plus some other details.

The Linux API could also be used directly if needed.

There is something that will suit your needs in C++11 standard. It is condition_variable. It allows you to wait until it is notified by other thread. So your worker could wait until producer notifies it like this. Note this is very dumbed down example and in most situations insufficent but gives you gist how to do it

std::deque<int> q;
std::mutex m;
std::condition_variable cv;

void worker() {
    while (active) {
        std::deque<int> vals;
        {
            std::unique_lock<std::mutex> l(m);
            cv.wait(l, []{return q.empty();});
            vals = std::move(q);
            q.clear();
        }
        for (const auto& val : vals)
            doSomething(val);
    }
}

void producer() {
    while (active) {
        {
            std::unique_lock<std::mutex> l(m);
            q.push_back(produce());
        }  
        cv.notify_one();
   }
}

Tagged : / / / /

Code Bug Fix: C++17, initialise an array

Original Source Link

I’m (re)learning C++ and want to initialise an array of objects.

struct Pea{
    Pea(double lower, double upper){
        static std::default_random_engine generator;
        static std::uniform_real_distribution<double> distribution(lower,upper);
        weight = distribution(generator);
    }
    double weight;
};

class Items{
    std::array<Pea,10> peas();
public:
    Items(){

    }
    void show(){
        std::for_each(begin(peas),end(peas),
                      [](auto pea){
                          std::cout << pea.weight << std::endl;

                      });
    }
};

The example is a big silly but it’s just for learning. I want to initialise the array peas with random weights. But I want to specify the lower and upper limits of random.

The line std::array<Pea,10> peas(1.2,2.3); does not compile, as expected, can anyone suggest a ‘modern’ way of doing this.

Thanks

Initializing the member in the constructor in case you need to depend on the constructor parameters:

class Items{
    std::array<Pea, 3> peas; // no ()
public:
    Items() : peas({ {0., 1.}, {2., 3.}, {4., 5.} }) {
    }
}

Otherwise, you can initialize it directly:

class Items{
    std::array<Pea, 3> peas{{ {0., 1.}, {2., 3.}, {4., 5.} }};
    // ...
}

Another way:

class Items{
    std::array<Pea, 3> peas = {{ {0., 1.}, {2., 3.}, {4., 5.} }};
    // ...
}

This declares a function called peas that returns a std::array<Pea,10>, and that won’t work.

std::array<Pea,10> peas();

Here’s my take on what needs to be done:

#include <algorithm>
#include <array>
#include <iostream>
#include <random>

struct Pea{
    Pea(double lower, double upper) {
        // the prng should be seeded - or else it'll start with the same default
        // seed every time. std::random_device is usually good enough, but if you use
        // an old version of MinGW, extra care needs to be taken (see other answers
        // for that):
        static std::default_random_engine generator{std::random_device{}()};

        // don't make the distribution static or else all Peas will share the same
        // distribution:
        std::uniform_real_distribution<double> distribution(lower, upper);

        weight = distribution(generator);
    }
    double weight;
};

class Items{
    std::array<Pea, 4> peas;

public:
    Items() : // use the member initializer list
        peas{{ {0., 1.}, {2., 3.}, {4., 5.}, {6., 7.} }}
    {}

    void show(){
        std::for_each(begin(peas),end(peas),
                      [](auto pea){
                          std::cout << pea.weight << std::endl;
                      });
    }
};

int main() {
    Items i;
    i.show();
}

If you have no other use for the Items constructor, you can initialize peas directly:

class Items{
    std::array<Pea, 4> peas{{ {0., 1.}, {2., 3.}, {4., 5.}, {6., 7.} }};

    // ...
};

Tagged : /

Server Bug Fix: N-Body Optimization

Original Source Link

I’ve created a serial C++ code for gravitational N-Body calculation. Because I expect to have upwards of 8-71 sparse bodies (ie, where Barnes-Hut is not necessarily practical) and running for long periods of time, I want to make as much use of parallelization and vectorization as possible. I did try a method with mutex and conditional_variable however, I found that this implementation works significantly faster: locking and unlocking mutexes proved more overhead for relatively short functions for threads. Forgive my probably obnoxious attempt at this, it is my first attempt at anything parallel and/or vectorized and I’m still new with C++, so I expect there will be plenty of criticism.

It is just two classes, Body and NBody and a helper namespace mathx.

Body.h

#pragma once

#include <immintrin.h>
#include <intrin.h>

struct Body {
    __m256d pos, vel;
    double mu;

    Body();
    Body(double MU, const __m256d& position, const __m256d& velocity);
    Body(const Body& orig);
    ~Body();

    virtual __m256d grav(const __m256d & R) const;
    void push(const __m256d & acc, const __m256d & dt);
};

Body.cpp

#include "Body.h"
#include <cmath>

Body::Body() {
    mu = 1;
    pos = _mm256_setzero_pd();
    vel = _mm256_setzero_pd();
}

Body::Body(double MU, const __m256d& position, const __m256d& velocity){
    pos = position;
    vel = velocity;
    mu = MU;
}

Body::Body(const Body& orig) {
    pos = orig.pos;
    vel = orig.vel;
    mu = orig.mu;
}

Body::~Body() {
}

__m256d Body::grav(const __m256d & R) const {
    const double g = mu/(R[3]*R[3]*R[3]);
    return _mm256_mul_pd(_mm256_broadcast_sd(&g),R);
}

void Body::push(const __m256d & acc, const __m256d & dt){
    vel = _mm256_fmadd_pd(acc,dt,vel);
    pos = _mm256_fmadd_pd(vel,dt,pos);
}

NBody.h


#pragma once

#include "orbital/Body.h"
#include <vector>
#include <atomic>
#include <stdint.h>
#include <thread>

class alignas(32) NBody {
public:  
    NBody();
    ~NBody();

    void addBody(const Body & b);

    void par_leapfrog(double time);
    void par_step();

    void setTime(double time);
    void setTimestep(double step);
    void setTimeInterval(double t_interval);

    void output(std::string filename);

private:

    // Body Stuff
    std::vector< Body > bodies;

    std::vector< double > times;
    std::vector< std::vector< double * > > positions; // for some reason cant store __m256d

    void setup();
    void getNThreads();
    void leapfrog_halfstep();

    // Time Stuff
    double t = 0., dt = 5, time_interval = 3600.0, t_test = 0.;
    __m256d _dt;

    // Gate / Parallel Stuff
    std::atomic<uint_fast8_t> nFinished = 0;
    bool done = false;
    bool step = false;
    bool accelerate = false;
    bool push = false;

    // Thread Function
    void worker();

    // Internal Variables
    uint_fast8_t nBodies,nThreads,nR;
    std::atomic<uint_fast8_t> idxR, idxBody; 
    __m256d * R; // array of vector distance between bodies

};

NBody.cpp

#include "NBody.h"
#include <utility>
#include "geometry/mathx.h"
#include <iostream>
#include <string>
#include <cmath>

NBody::NBody() {
    _dt = _mm256_broadcast_sd(&dt);
}

NBody::~NBody() {  
}

void NBody::addBody(const Body & b){
    bodies.push_back(b);  
}

void NBody::par_leapfrog(double time){
    setup();

    leapfrog_halfstep(); // single threaded half step

    std::thread body_threads[nThreads];

    for(uint_fast8_t i = 0; i < nThreads; i++){
        body_threads[i] = std::thread(&NBody::worker, this);
        body_threads[i].detach();
    }

    while(t < time) {

        par_step();

        if(t > t_test) {
            times.push_back(t);
            t_test += time_interval;
        }

        t += dt;        
    }

    done = true;  
    // threads will destroy here
}

void NBody::setup() {
    t_test = t;
    nBodies = bodies.size();
    done = false;
    positions.resize(nBodies);
    nR = mathx::combination(nBodies,2);
    R = new __m256d[nR];

    // reset this
    step = false;
    accelerate = false;
    done = false;

    getNThreads();
}

void NBody::leapfrog_halfstep() {

    // single thread this for convenience
    __m256d acc;
    __m256d dt2 = _mm256_set_pd(dt/2,dt/2,dt/2,dt/2);
    for(uint_fast8_t i = 0; i < nBodies;i++) {
        acc = _mm256_setzero_pd();
        for(uint_fast8_t j = 0; j < nBodies; j++) {
            if(i != j) {
                __m256d R_tmp = _mm256_sub_pd(bodies[j].pos,bodies[i].pos);
                __m256d tmp = _mm256_mul_pd(R_tmp,R_tmp);
                R_tmp[3] = sqrt(tmp[0]+tmp[1]+tmp[2]);
                acc = _mm256_add_pd(acc,bodies[j].grav(R_tmp));
            }
        }
        bodies[i].vel = _mm256_fmsub_pd(acc,dt2,bodies[i].vel);
    }
}

void NBody::getNThreads() {
    int max = std::thread::hardware_concurrency()-1;
    if (nBodies < max){
        nThreads = nBodies;
    } else {
        if (max > 0) {
            nThreads = max;
        } else {
            nThreads = 2;
        }
    }
}

void NBody::par_step(){  
    // Gate 1
    idxR = 0; 
    nFinished = 0;
    step = true;
    while(nFinished < nThreads){}
    step = false;
    // Gate 2
    idxBody = 0;
    nFinished = 0;
    accelerate = true;
    while(nFinished < nThreads){}
    accelerate = false;
}


void NBody::worker() {
    __m256d acc;
    uint_fast8_t i_body,j_body,ix,ix1;


    // Generate indexes locally
    uint_fast8_t is[nR];
    uint_fast8_t js[nR];
    uint_fast8_t idx_R[nBodies][nBodies];

    unsigned int count = 0;
    for ( i_body = 0; i_body < nBodies;i_body++) {
        for( j_body = i_body+1; j_body < nBodies; j_body++) {
            is[count] = i_body;
            js[count] = j_body;
            count++;
        }
    } 

    for(i_body = 0; i_body < nBodies; i_body++){
        for(j_body = 0; j_body < nBodies; j_body++) {
            if(j_body > i_body) {
                idx_R[i_body][j_body] = (i_body*nBodies + j_body - mathx::combination(i_body+2,2));
            } else {
                idx_R[i_body][j_body] = (j_body*nBodies + i_body - mathx::combination(j_body+2,2));
            }
        }    
    }

    while (!done) { 

        while(!step){if(done) return;}

        while(idxR < nR) {
            ix = idxR.fetch_add(2);
            if(ix >= nR) {
                break;
            }

            ix1 = ix+1;

            __m256d dr1 = _mm256_sub_pd(bodies[js[ix]].pos,bodies[is[ix]].pos); 
            __m256d dr1_sq = _mm256_mul_pd( dr1,dr1 );

            if(ix1 < nR) {

                __m256d dr2 = _mm256_sub_pd(bodies[js[ix1]].pos,bodies[is[ix1]].pos); 
                __m256d dr2_sq = _mm256_mul_pd( dr2,dr2 );

                __m256d temp = _mm256_hadd_pd( dr1_sq, dr2_sq );
                __m128d hi128 = _mm256_extractf128_pd( temp, 1 );
                __m128d dotproduct_sqrt = _mm_sqrt_pd(_mm_add_pd( _mm256_castpd256_pd128(temp), hi128 ));

                dr1[3] = dotproduct_sqrt[0];
                dr2[3] = dotproduct_sqrt[1];

                R[ix] = std::move(dr1);
                R[ix1] = std::move(dr2);

            } else {

                dr1[3] = sqrt(dr1_sq[0]+dr1_sq[1]+dr1_sq[2]);
                R[ix] = std::move(dr1);

            }
        }

        nFinished++;

        while(!accelerate){}

        while(idxBody < nBodies) { // this check is quick and avoids having to fetch add again
            i_body = idxBody++;
            //i_body = idxBody.fetch_add(1);
            if(i_body >= nBodies){
                break;
            }

            // Store position prior to push
            if (t > t_test) {
                double pos[] = new double[3]{bodies[i_body].pos[0],bodies[i_body].pos[1],bodies[i_body].pos[2]}; 
                positions[i_body].push_back(pos));
            }

            // sum gravitational acclerations
            acc = _mm256_setzero_pd();
            for(j_body = 0; j_body < nBodies; j_body++) {
                // reverse vector (subtract) if index are reverse order
                if(j_body > i_body) {
                    acc =_mm256_add_pd(bodies[j_body].grav(R[idx_R[i_body][j_body]]),acc);
                } else if (j_body < i_body) {
                    acc =_mm256_sub_pd(bodies[j_body].grav(R[idx_R[i_body][j_body]]),acc);
                }
            }

            bodies[i_body].push(acc,_dt);

        }

        nFinished++;
    }

}


void NBody::setTime(double time){
    t = time;
}

void NBody::setTimestep(double step){
    dt = step;
    _dt = _mm256_broadcast_sd(&dt);
}

void NBody::setTimeInterval(double t_interval){
    time_interval = t_interval;
}

mathx.h

#pragma once

#include <vector>
#include <utility>

#define UINT unsigned int

namespace mathx {

    double legendrePoly(UINT n, double x);

    double assocLegendrePoly(UINT l, UINT m, double x);

    const unsigned long long factorial[] = {1,1,2,6,24,120,720,5040,40320,362880,3628800,39916800,479001600,6227020800,87178291200,1307674368000,20922789888000,355687428096000,6402373705728000,121645100408832000,2432902008176640000};

    double generalBinomial(double alpha, UINT k);

    const UINT C[11][11] = {{1},{1,1},{1,2,1},{1,3,3,1},{1,4,6,4,1},{1,5,10,10,5,1},{1,6,15,20,15,6,1},{1,7,21,35,35,21,7,1},{1,8,28,56,70,56,28,8,1},{1,9,36,84,126,126,36,9,1},{1,10,45,120,210,252,210,120,45,10,1}};

    UINT combination(UINT n, UINT k);

}

mathx.cpp


#include "mathx.h"
#include <cmath>

namespace mathx {

    double legendrePoly(UINT n, double x){
        if (n == 0)
            return 1;
        if (n == 1)
            return x;

        double sums = 0;

        for (UINT k = 0; k < n; k++) { 
            if (k > 3){
                sums += pow(x,k) * (combination(n,k) * generalBinomial((n+k-1)*0.5,n));
            } else {
                if(k == 0) {
                    sums += generalBinomial((n+k-1)*0.5,n);
                } else {
                    if(k == 1) {
                        sums += x * n * generalBinomial((n+k-1)*0.5,n);
                    } else {
                        sums += x * n * generalBinomial((n+k-1)*0.5,n);
                    }
                }
            }
        }
        return (1<<n) * sums;
    }

    double assocLegendrePoly(UINT l, UINT m, double x){
        int sums = 0;
        for (UINT k = m; k <= l; k++) {
            int prod = k;
            for (UINT j = m; m < k; m++)
                prod *= j;
            sums += prod* pow(x,k-m) * combination(l,k) * generalBinomial((l+k-1)*0.5,l);
        }
        if (m % 2 == 0)
            return (1<<l) * pow((1-x*x),m/2) *sums;
        else
            return -1 * (1<<l) * pow((1-x*x),m*0.5) *sums;
    }

    double generalBinomial(double alpha, UINT k){
        // this can be further optimized for half values required by legendre
        double res = 1;
        for (UINT i = 1; i <= k; ++i)
            res = res * (alpha - (k + i)) / i;
        return res;
    }

    UINT combination(UINT n, UINT k) {
        if(n <= 10) {
            return C[n][k];
        }
        if(k > n/2){
            return combination(n,n-k);
        }
        UINT num = n;
        UINT den = k;
        //vectorizable
        for(UINT i = 1; i < k; i++){
            den *= i;
            num *= (n-i);
        }
        return num/den;
    }
}

Thanks in advance!

EDIT:

Adding some of my testing calls that I used, really basic stuff I just inserted into a main function.


int test_parallel(int n, double t) {
    //unsigned seed1 = std::chrono::system_clock::now().time_since_epoch().count();
    std::default_random_engine generator;

    std::uniform_real_distribution<double> mus (1.0,2.0);
    std::uniform_real_distribution<double> xs (-2.0,2.0);

    NBody sim;

    for(int i = 0; i<n;i++) {
        sim.addBody(Body(mus(generator),_mm256_set_pd(0.0,xs(generator),xs(generator),xs(generator)),_mm256_set_pd(0.0,xs(generator),xs(generator),xs(generator))) );
    }

    std::cout << "start test 3 n";
    auto t1 = std::chrono::high_resolution_clock::now();
    sim.par_leapfrog(t);
    auto t2 = std::chrono::high_resolution_clock::now();
    std::cout << "test function took " << std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count() << " milliseconds n";
    return 0;
}

int testBody() {

    Body B = Body(2, _mm256_set_pd(0.0,1.0,1.0,1.0),_mm256_set_pd(0.0,-1.0,-1.0,-1.0));

    __m256d dt = _mm256_set_pd(1.0,1.0,1.0,1.0);
    __m256d acc = _mm256_set_pd(2.0,2.0,2.0,2.0);

    B.push(acc,dt);

    if(abs(B.pos[0]-2.0) < 1e-12 && abs(B.pos[1]-2.0) < 1e-12 && abs(B.pos[2]-2.0) < 1e-12) {
        if(abs(B.vel[0]-1.0) < 1e-12 && abs(B.vel[1]-1.0) < 1e-12 && abs(B.vel[2]-1.0) < 1e-12) {
            return 0;
        } else {
            return 2;
        }
    } else {
        return 1;
    }

}

int testGravity() {

    Body B = Body();
    B.mu = 16;

    __m256d R = _mm256_set_pd(2.0,0.0,2.0,0.0);
    __m256d g = B.grav(R);

    if(abs(g[1]-4.0) < 1e-12 ) {
        if(abs(g[0]) > 1e-12 ) {
            return 2;
        } 
        return 0;
    } else {
        return 1;
    }

}

```

Data layout

You have already experienced first-hand a disadvantage of using “1 physics vector = 1 SIMD vector” (such as __m256d pos), causing some friction when coordinates come together:

__m256d temp = _mm256_hadd_pd( dr1_sq, dr2_sq );
__m128d hi128 = _mm256_extractf128_pd( temp, 1 );
__m128d dotproduct_sqrt = _mm_sqrt_pd(_mm_add_pd( _mm256_castpd256_pd128(temp), hi128 ));

Mixing different coordinates in the same SIMD vector leads to horizontal addition and shuffles and extraction and such. Horizontal addition is relatively expensive, equivalent to two shuffles plus a normal addition. _mm256_castpd256_pd128 is free, but extracting the upper half is not.

That strategy of using the 4th component for a different value is also a problem, causing even more extract/insert operations. As a rule of thumb, avoid indexing into SIMD vectors. It’s fine to use that construct a bit in a pinch, but I would say it’s overused here.

There is an alternative: put the X components of 4 physics vectors together into a SIMD vector, Y in an other SIMD vector, etc. You could have groups of 4 bodies together (AoSoA), or a big array of just X and an other of Y and so on (SoA).

That’s a significant rewrite, but I recommend it. That Vec3 that was mentioned, I recommend against the entire idea. It’s still using SIMD against the grain. It’s a really “attractive looking trap”, letting you express the computation in a way that feels nice, but it’s not a way that results in good code.

Unnecessary move

Moving SIMD vectors is not useful. They’re trivial to copy and hold no resource.

Alignment

Aligning NBody aligns its first field, which is an std::vector (so the vector object itself, not the data it holds). That’s not useful, but also not harmful. std::vector should, as of C++17, respect the alignment of the data inside it (before 17, that was simply broken).

Scary synchronization

bool accelerate should not be used for synchronization, it makes this construct unsafe: while(!accelerate){}. That loop may not terminate, or it may work as intended, it’s not reliable. Using atomic<bool> would make the threads communicate safely.

Basics:

Body.h/Body.cpp

The class Body is extremely simple and all its functions are under 5 lines. Calling a function is a relatively heavy operation and calling a virtual function is even more so. Putting but a few operations inside a function will make it an inefficient call. Unless, the function is inlined. The compiler cannot inline functions that are hidden from compilation – so you should move all the quick functions to the header and keep cpp for the heavier stuff.

P.S. why does this class even have a virtual function? you don’t utilize the property anywhere.

Multithreading:

Inherently, when you multithread your code, the computer has to do more work. All the data synchronization and memory-ownership swapping is not cheap for low-level code. So it is quite possible that the single threaded version would run faster – or at the same speed just with single core at maximal capacity instead of all of them.

If the number of bodies would be huge, like a few thousands, then perhaps multi-threading will improve performance. Though, the exact numbers surely depends on the platform and implementation.

You should read more on std::atomic as regular operations like ++, --, +=, -=, = are slow and usually unnecessarily so. You should read its memory model and use operations like load, store, fetch_add... with appropriate memory instructions.

Linear Algebra:

As suggested by @harold, you shouldn’t use __m256d for storing x,y,z coordinates of the body but rather store’s n-body’s coordinates in a 3xn matrix. Also this way you can perform matrix level operations and utilize SIMD types more efficiently: e.g., you don’t waste a coordinate and you can utilize AVX512 instructions which holds twice as much data as __m256d.

Algorithm:

You use a very basic and inaccurate algorithm for N-Body computation: V(t+dt) = V(t) +dt*a(t) and P(t+dt) = P(t)+dt*V(t+dt).
I think this is like first order of inaccuracy. What’s the point of running the simulation for a long time if it is of such a low accuracy?

You should check out better solutions like Runge–Kutta methods.

Tagged : / / / /

Code Bug Fix: lambda failed on C++14/17 and works on later version

Original Source Link

I have a priority queue definition got failed on C++14/17 but works with the later standard.
Would anyone tell me why?

    auto compare = [](const pair<int, int>& p1, const pair<int, int>& p2) {
        return p1.second < p2.second;
    };

    // compile failure on C++ 14/17 
    priority_queue<pair<int, int>, vector<pair<int, int>>, decltype(compare)> pq;

May I ask a future question? Why the first case with lambda is fine but the second is wrong:

auto comp = [&](const pair<string,int>& a, const pair<string,int>& b) {
    return a.second > b.second || (a.second == b.second && a.first < b.first);
};
// OK!!!
// typedef priority_queue< pair<string,int>, vector<pair<string,int>>, decltype(comp)> my_priority_queue_t;
// my_priority_queue_t  pq(comp);

// BOMB!!!
priority_queue< pair<string,int>, vector<pair<string,int>>, decltype(comp)> pq;

You call the following constructor of priority_queue:

priority_queue() : priority_queue(Compare(), Container()) { }

from reference. As you can see it calls Compare() – default constructor of comparator which in your case is closure type. But before C++20 closures were not default constructible. That is why this overload of constructor doesn’t work.

Since C++20 this all is fine because closures have:

ClosureType() = default;
(since C++20)(only if no captures are specified)

from lambdas reference.

Tagged : / /

Code Bug Fix: Inheriting class with multi parameter pack variadic template

Original Source Link

I have a variadic template class with multiple parameter packs, something like this:

template <typename... Types>
struct TopicsList { };

template <typename... Topics>
class TheParent;

template <typename... OutputTopics, typename... InputTopics>
class TheParent<TopicsList<OutputTopics...>, TopicsList<InputTopics...>>
{
};

I am trying to inherit from this class, like this:

template <typename... Topics>
class TheChild;

template <typename... OutputTopics, typename... InputTopics>
class TheChild<TopicsList<OutputTopics...>, TopicsList<InputTopics...>>
    : public TheParent<OutputTopics..., InputTopics...>, public ::testing::Test
{
};

But I am getting the following compilation error:

error: invalid use of incomplete type 'class {anonymous}::TheParent<TopicA, TopicB, TopicC>'
 class TheChild<TopicsList<OutputTopics...>, TopicsList<InputTopics...>>

Any clues why is it so would be highly appreciated!

What about

template <typename... OutputTopics, typename... InputTopics>
class TheChild<TopicsList<OutputTopics...>, TopicsList<InputTopics...>>
    : public TheParent<TopicsList<OutputTopics...>, TopicsList<InputTopics...>>, public ::testing::Test
// ....................^^^^^^^^^^^...............^^^^^^^^^^^^^^..............^
{
};

?

I mean… if your TheParent is declared only receiving a couple of TopicsList, you have to maintain the TopicsList wrapper passing the parameters from TheChild to TheParent.

Tagged : / / /

Server Bug Fix: Should I make my local variables const or movable?

Original Source Link

My default behaviour for any objects in local scopes is to make it const. E.g.:

auto const cake = bake_cake(arguments);

I try to have as little non-functional code as I can as this increases readability (and offers some optimisation opportunities for the compiler). So it is logical to also reflect this in the type system.

However, with move semantics, this creates the problem: what if my cake is hard or impossible to copy and I want to pass it out after I’m done with it? E.g.:

if (tastes_fine(cake)) {
  return serve_dish(cake);
}

As I understand copy elision rules it’s not guaranteed that the cake copy will be elided (but I’m not sure on this).

So, I’d have to move cake out:

return serve_dish(std::move(cake)); // this will not work as intended

But that std::move will do nothing useful, as it (correctly) will not cast Cake const& to Cake&&. Even though the lifetime of the object is very near its end. We cannot steal resources from something we promised not to change. But this will weaken const-correctness.

So, how can I have my cake and eat it too?

(i.e. how can I have const-correctness and also benefit from move semantics.)

I believe it’s not possible to move from a const object, at least with a standard move constructor and non-mutable members. However, it is possible to have a const automatic local object and apply copy elision (namely NRVO) for it. In your case, you can rewrite your original function as follows:

Cake helper(arguments)
{
   const auto cake = bake_cake(arguments);
   ...  // original code with const cake
   return cake;  // NRVO 
}

Then, in your original function, you can just call:

return serve_dish(helper(arguments));

Since the object returned by helper is already a non-const rvalue, it may be moved-from (which may be, again, elided, if applicable).

Here is a live-demo that demonstrates this approach. Note that there are no copy/move constructors called in the generated assembly.

It seems to me, that if you want to move, than it will be “const correct” to not declare it const, because you will(!) change it.
It’s ideological contradiction. You cannot move something and leave in place at the same time.
You mean, that object will be const for a part of time, in some scope. In this case, you can declare const reference to it, but it seems to me, that this will complicate the code and will add no safety.
Even vice versa, if you accidentally use the const reference to object after std::move() there will be problems, despite it will look like work with const object.

A limited workaround would be const move constructor:

class Cake
{
public:
    Cake(/**/) : resource(acquire_resource()) {}
    ~Cake() { if (owning) release_resource(resource); }

    Cake(const Cake& rhs) : resource(rhs.owning ? copy_resource(rhs.resource) : nullptr) {}
    // Cake(Cake&& rhs) // not needed, but same as const version should be ok.
    Cake(const Cake&& rhs) : resource(rhs.resource) { rhs.owning = false; }

    Cake& operator=(const Cake& rhs) {
        if (this == &rhs) return *this;
        if (owning) release_resource(resource);
        resource = rhs.owning ? copy_resource(rhs.resource) : nullptr;
        owning = rhs.owning;
    }
    // Cake& operator=(Cake&& rhs) // not needed, but same as const version should be ok.
    Cake& operator=(const Cake&& rhs) {
        if (this == &rhs) return *this;
        if (owning) release_resource(resource);
        resource = rhs.resource;
        owning = rhs.owning;
        rhs.owning = false;
    }
    // ...

private:
    Resource* resource = nullptr;
    // ...
    mutable bool owning = true;
};
  • Require extra mutable member.
  • not compatible with std containers which will do copy instead of move (providing non const version will leverage copy in non const usage)
  • usage after move should be considered (we should be in valid state, normally). Either provide owning getter, or “protect” appropriate methods with owning check.

I would personally just drop the const when move is used.

Tagged : / / /