in

Outlier removal techniques from an array


I know there’s a ton resources online for outlier removal, but I haven’t yet managed to obtain what I exactly want, so posting here, I have an array (or DF) of 4 columns. Now I want to remove the rows from the DF based on a column’s outlier values. The following is what I have tried, but they are not perfect.

def outliers2(data2, m = 4.5):
c=[]
data = data2[:,1] # Choosing the column
d = np.abs(data - np.median(data)) # deviation comoutation
mdev = np.median(d) # mean deviation
for i in range(len(data)):
    if (abs(data[i] - mdev) < m * np.std(data)):
        c.append(data2[i])            
return c

x = pd.DataFrame(outliers2(np.array(b)))
column = ['t','orig_w','filt_w','smt_w']
x.columns = column

#Plot
plt.rcParams['figure.figsize'] = [10,8]
plt.plot(b.t,b.orig_w,'o',label="Original",alpha=0.8) # Original
plt.plot(x.t,x.orig_w,'.',c="r",label="Outlier removed",alpha=0.8) # After outlier removal
plt.legend()

the plot illustrates how the results looks, red points after the outlier treatment over the blue original points. I would really like to get rid of those vertical group of points around the x~0 mark. What to do ?

A link to the data file is provided here : Full data



Source: https://stackoverflow.com/questions/70719806/outlier-removal-techniques-from-an-array

Covid: Coastguard set for busy staycation summer on 200th anniversary thumbnail

Covid: Coastguard set for busy staycation summer on 200th anniversary

Video Collections — For kids who like to watch same videos over and over again