Category Archives: Uncategorized

An audio dataset and IPython notebook for training a convolutional neural network to distinguish the sound of foosball goals from other noises using TensorFlow

So I was looking into ways to process and build machine learning models to work with audio data. Here is a beautiful article by humblesoftwaredev to do the same. ūüôā Do read.

humblesoftwaredev

tl;dr:

fooshttps://github.com/dk1027/ConvolutionalNeuralNetOnFoosballSounds
IPython notebook: CNN on Foosball sounds.ipynb
Trained CNN model using TensorFlow: model.ckpt
Pickled Pandas dataframe: full_dataset_44100.pickle

Abstract

I setup mics at the foosball table and recorded a few hours of foosball games. The audio files were labelled by hand and then segmented into one-second clips of goals / other noises. Mel spectrograms were created from the clips and about 200 samples were created and used for training, testing, and validation, resulting in 5% error on test data.

Data collection and labelling

I used a Zoom H5 XY stereo mic, a Shure SM57, and a few other mics for recording. Each mic had its own characteristic and they were placed at different locations around the table, for instance, pointing close to a goalie, high above the table pointing downward, or from one side of the table pointing at a goalie at the far side. There might be enough differences…

View original post 578 more words

A Dramatic Tour through Python’s Data Visualization Landscape (including ggplot and Altair)

Regress to Impress

Why Even Try, Man?


I¬†recently came upon Brian Granger and Jake VanderPlas‚Äôs Altair, a promising young visualization library. Altair seems well-suited to addressing Python‚Äôs ggplot envy, and its tie-in with JavaScript‚Äôs Vega-Lite grammar means that as the latter develops new functionality (e.g., tooltips and zooming), Altair benefits ‚ÄĒ seemingly for free!

Indeed, I was so impressed by Altair that the original thesis of my post was going to be: ‚ÄúYo, use Altair.‚ÄĚ

But then I began ruminating on my own Pythonic visualization habits, and ‚ÄĒ in a painful moment of self-reflection ‚ÄĒ realized I‚Äôm all over the place: I use a hodgepodge of tools and disjointed techniques depending on the task at hand (usually whichever library I first used to accomplish that task1).

This is no good. As the old saying goes:¬†‚ÄúThe unexamined plot is not worth exporting to aPNG.‚Ä̬†

Thus, I’m using my discovery…

View original post 5,025 more words

I’m Back.

Tumblr_ma0mn8e8y11rwn32n.gif

Hey guys, Its been so long since I last posted anything on this blog of mine. Honestly, I had every intensions of continuing to blog, but I just got caught up with life. So, here I’m Back.¬†The main intension of starting this blog was to share what I learnt from my projects and my learning with others, which in a way helped me understand better. I had started learning python with an intension to learn about machine learning and hence named the blog,¬†pythonformachinelearning.¬†One of the reasons I figured I wasn’t able to contribute to the blog, was the disconnect I had between my full time job learnings and python, especially machine learning. I still use python for my day to day work as a pretentious data scientist, and in product development/management while integrating the insights and methods into developing marketing technology products at Sokrati.Inc. I really missed blogging and sharing what I learnt everyday, but the niche name of my blog kept me at bay. So, today I decided to rename my blog to python for everything lazy. I intend to make it a place where I share small insights and ways of using python, like I do, for getting the mundane things done.

I didn’t really miss blogging though.

tumblr_inline_mzb9y331xe1rk5g1r

Laters. ūüôā

Your Very Own Personalised Image Search Engine using python.

For last couple of  weeks I had gone on a voyage that took me around India to places I have always wanted to go. My parents decided that we should celebrate my birthday in the foothills of himalayas. Which was a brilliant, as it was something I longed for , a break from the regular city life and move into the hilly sides of the country with treacherous roads  winding into blinds corners and tea garden in the north eastern state of india called sikkim. The journey was brilliant and stay was even better.10582310_10203006296852231_1634530816_n

As it happens in all such journeys, we clicked numerous photographs posing in different  stances to settle for the best ones. With so many photographs and my mums request to show all the clicks that were taken at a particular site I was bogged with a tedious task that required a lot of human effort and mundane to an extent that my  head started  aching . As I had no clue which folder that particular picture came from. I thought why not automate the whole process.

So I wrote a script and later added a GUI for my mum who is not tech savvy so, I made it into a very easy four click process. My software was able retrieve images from my PC of around 400 GB of Harddisk images that I was looking for. Now I am sharing it with you guys.

One of the best result I was able to get from this software was when I provided it with an image straight off ¬†the internet that resembled the picture I was looking for and I was able to retrieve that particular image from a heap of 5300 images in my drive in about 20 seconds. Results were even better when I used one image I had lying on my desktop and wanted to find the folder it belonged to in one of the drives. ūüėÄ

The image I took from the google images which was visually similar was this from Sonya and Travis¬†¬†Blog¬†. Huge shout out for them ūüôā . Go check out their blog as well. ¬†So here is the picture

amritsar-punjab-india-golden-temple-q-sonya-at-the-golden-temple

And I was able to retrieve this image from the  Hard Disk. Along with the location to be able to find the rest of the related images.

IMG_0096

After adding some GUI to the Script I ended up with this Search Engine/ Image Retrieval system which looked something like this :D.

Search Engine

 

By the end of this post you will be made capable of making your very own Image Search Engine like Tin Eye.

Here’s a quick little demo of what I was able to accomplish.

So lets get the skeleton of the image search engine ready . Lets get close and personal with the cool stuff.



BREAKING DOWN THE PROBLEM STATEMENT.

Our search engine will have the following feature in the order of execution as given below.

  • Selection of the Picture to be searched.
  • Selection for the directory where the search needs to be carried out.
  • Searching the directory for all the Pictures.
  • Creating feature index of the Pictures.
  • Evaluating the same feature for the search Picture.
  • Matching the pictures in our search.
  • outputting the matched Pictures.

SELECTION OF THE SEARCH PICTURE AND SEARCH DIRECTORY

First of all we will be requiring openCV for python to be able to continue with our tutorial. So Please download and install. A quick google search will help you get the required . Now other libraries that need to be imported are as follows.

import os
from os.path import join
import cv2
import numpy as np
import time as time
import scipy.spatial.distance as dist

Here we just need to know the address of the search image and the we need to specify the directory where the search needs to to carried out.  Which looks something like this.

directory="D:\photo"
searchImage="C:\image11.jpg"

This is the format in which images and directory are passed into the system.
Now that we have the search Directory where we want to search.


 SEARCHING THE DIRECTORY FOR ALL THE PICTURES AND MAKING AN INDEX

Now that we know the directory we want , the thing is you need to have an index of images to compare your search image. For this we need to crawl your computer looking for images of jpg format and find the features we will require for comparison later.

index={}
def find(directory):
      for (dirname,dirs,files) in os.walk(directory):
          for filename in files:
              if (filename.endswith(".JPG")):
                  fullpath=join(dirname,filename)
                  index[fullpath]=features(fullpath)

      print "total number of photos in this directory %s"%len(index)
      return index

Here we are using the module os.walk from the library os to scan through a particular directory for all the JPG files and then using our features function to generate feature for that particular photo or image and add it to the dictionary named index with the key as Image adress or fullpath of the image, so that we can know where that particular image was found for retrieving later.


FEATURE FUNCTION FOR OUR IMAGES

Now we are going to proceed to evaluating the features from our images.  what are features btw? Now features is something that distinctly defines an image much like the skin tone of most of the indian people is brown as compared to someone from europe. These are features that might help you in distinguishing between them. Of Course there are other features like facial descriptions,voice which could be used as there no limit to the kind and number of features you use.

The one we are using here to define a feature is called histogram of color of  image. It is basically a frequency plot of the intensity of the Red , blue and green color according to each pixel. This is one brilliant video about the histogram of color of images. check it out to understand in dept.

Now lets get started with defining the function.Our function takes the image location as the input and output the histogram value to the function.

def features(imageDirectory):
       img=cv2.imread(imageDirectory)
       histogram=cv2.calcHist([img],[0,1,2],None,  [8,8,8],,256,0,256,0,256])
       Nhistogram=cv2.normalize(histogram)
       return Nhistogram.flatten()

Now what we are doing here is using cv2(openCV for python) library to read the file then using the cv2 to generate a matrix containing the histogram value of the image.
Nhistogram is the normalised histogram.Normalising the histogram helps us make the image scale invariant. What that means that even if you increase the size of the image or decrease the size the image histogram thus produced will be always be similar If not exact.Normalising also helps us make the histogram robust to image rotation error. So even if the image is rotated 90 degree or any other value of rotation, the histogram will always remain similar to the original one.

Finally we flatten the histogram matrix and bringing its dimension down to 1 which is essentially a list of numerical value of pixel intensity.

Now we have our features ready.


 

COMPARING THE FEATURES.

As we have our feature function ready and hence we can find out the histogram value of any image.

All we need now is a function which helps us determine the ranks of the images after comparison and finally give us the top 10 images which look similar to our search image.

def search(SearchImage,SearchDir):
     histim=histogramvalue(SearchImage)
     allimages=find(SearchDir)
     match=top(histim,allimages)
     return match

Now that we have defined the function for searching and producing the search results. All we are left with is the write the function that defines top10 matches in the image search and produces the images that are very close to our images visually.

def top(histim,allimages):
      correlation={}
      for (address,value) in allimages.items():
      correlation[address]=cv2.compareHist(histim,value,cv2.cv.CV_COMP_CHISQR)
      ranked=sorted(correlation.items() ,key=lambda tup:       float(tup[1]))
      return ranked[0:10]

we are using chi-Square distance to measure the correlation between the two images .you can probably use other correlation parameters to find better result or even take the weighted average of the results for different correlation evaluation functions like city block or canberra distance. The lower the chi-squared distance more the chances of the two images to look similar. hence we use sorted.

sorted is used to sort the images in increasing order of chi squared distance value. The lesser the distance more the correlation. Hence the top10 results would be the top 10 entries in list returned by top function.


Now we have all the functions ready . All we need to do is call them in particular order to make our search effective.

first we need to define our search directory and search image.

directory="D:\photo"
searchImage="C:\image11.jpg"

We just need to pass these two parameters to the search function.

finalOutput=search(searchImage,directory)

final output is a list of tuple of the form (address,value)

We iterate through the list and display the images as ranked.

for imageAdd,Histvalue in finalOutput:
    image=cv2.imread(imageAdd)
    resized=cv2.resize(image,(0,0),fx=0.25,fy=0.25)
    cv2.imshow("image directory %s %s"% (imageAdd,Histvalue),resized)
    cv2.waitKey(0)

There you have it your very own Personalised Image search engine which works on your own data.

So If you want to Know How to develop the GUI for this particular script , which will open up a whole new world of programming and software development. SIGN UP ¬†ūüėĬ†using the follow option of this ¬†blog ,on your right hand side widgets column, to get a notification in you email when I post the next tutorial to develop this particular GUI.

f you have any Query or  need Further explanations  Leave it in the comments below.

Caveat, It really depends what calculations you are performing on the histogram that determines the search speed. If you use inbuilt functions which implement C they tend to take lot less time as compared to the self defined functions. But it all boils down to the trade off between accuracy and speed.

Everything You Wanted to Know About Machine Learning, But Were Too Afraid To Ask (Part One)

The Official Blog of BigML.com

Recentlypedro, Professor Pedro Domingos, one of the top machine learning researchers in the world, wrote a great article in the Communications of the ACM entitled “A Few Useful Things to Know about Machine Learning“.  In it, he not only summarizes the general ideas in machine learning in fairly accessible terms, but he also manages to impart most of the things we’ve come to regard as common sense or folk wisdom in the field.

It‚Äôs a great article because it‚Äôs a brilliant man with deep experience who is an excellent teacher writing for ‚Äúthe rest of us‚ÄĚ, and writing about things we need to know.¬† And he manages to cover a huge amount of ground in nine pages.

Now, while it’s very light reading for the academic literature, it’s fairly dense by other comparisons.  Since so much of it is relevant to anyone trying to use BigML…

View original post 1,307 more words

HDFS and MapReduce ..a non programmers guide about BIG DATA

         HADOOP

Alright this is a small post Im creating here just as preparatory post for HADOOP and MAP-REDUCE in the coming post . So lets get started.

What isss HDFS ?¬†HDFS stands for hadoop distributed file system . So the file system came into being when amount of data being stored and processed and used increased exponentially between late 1990’s and early 2000 . This was the time when search engines were being developed to quantify and rank the data and for that particular process huge huge amount of data was needed to be read for various operations .

The difficulty ? see back in the days when data was limited and could be stored onto drives which had a size less 1024 mb the speed at which data was being accessed was in MB/s which meant that data could easily be read off the drives in matter of minutes .

But as the data started to scale to terabytes and petabytes the rate at which the data was being accessed was not able to catch up . That means that at a transfer speed of 100mb/s we were trying to process terabytes of data which was not feasible at all considering the time it would take .

In order to solve this problem HDFS was developed . Now HDFS provided a way to distribute Data across multiple servers or connected computers in chunks of size 64MB.
These Chunks are called Blocks. Check out the classy diagram I drew  B)  All vintage and stuff.

IMG_20140707_154331Each machine that the data is sent to runs a separate operating system and these machine, known as nodes , are connected with each other through Super fast LAN or local Area Networks.These computer together are known as a cluster. cluster

In order to prevent losing the data when a node or computer fails or data access gets interrupted , 3 copies of these small chunks or blocks are kept on different machines or nodes .So If while processing the block 1 node 1 fails the node 2 which has another copy of block 1 can be used and process can be continued without much problem . This make HDFS system robust to data loses due to node failures.

Now so that we know which node each chunk or block is we have another Node or Machine which keep the track of it . This machine is known as Name Node. These are the main components or fundamentals of HDFS system and lets move to MAPREDUCE.

IMG_20140707_164205


MAPREDUCE

If we want to process this data stored in HDFS we use a paradigm or method called mapreduce . Google was the first know company to have used mapreduce in its work .Map reduce has two important phases

  • 1st ¬† ¬† Mapper phase
  • 2nd ¬†Reducer phase

I am going to try and explain it using an analogy .

Think about the current situation where world cup is happening in brazil . Now organisers want to find out attendance of people and which country they belong to .  So the problem is there huge crowd waiting to get into the stadium = huge amount of data.

stadium

Now if everyone is trying to get in from one single entry point it will lead to chaos much like loading all the data onto one machine or node. If now only one counter is trying to count the number of people from different country it will be a very messy and slow process.

crowd

So what we do is distribute the data like in HDFS system and distribute the crowd onto different counters like the blocks of data onto different machines.

IMG_20140708_100022

Now that the crowd is controlled and the flow is good we send a guy onto each counter to do the counting job done and we will call this guy a MAPPER . A mapper is a  programme that is sent  to different machine where the data exist.IMG_20140708_102809

Now what this guy mapper does is start counting the number of people from different countries. Each counter mapper maintains a different record. After all the mappers are done with the process they send the record in sorted order to another guy called REDUCER as a list of counts from different counters . The List received by the Reducer looks something like this .IMG_20140708_103536The Reducer now goes through the list .IMG_20140708_103911

The reducer goes through the list adding the counts from each country and produces a final result which is the desired value of count.IMG_20140708_104121Thats it  thats how map reduce works .


For more in depth and technical know how please check the following link

Now I’ll be posting about how to write a map reduce script in python in the coming week . This is just to get you started with the concept. ūüôā ¬†later!! please do like and subscribe and share. Those drawings took time ūüėõ .

tumblr_mlxm93KUoe1rvdee4o1_400

 

 

 

 

 

They all look the same. How do I classify ? :/

Today we will be dabbling into the world of classification analysis. In order to continue any further you need to know what classification analysis is and what is it used for. For those of you who are already familiar with what classification and regression mean continue to the subsection where I address the data we will be using today.


 

INTRODUCTION TO MACHINE LEARNING

Machine learning nowadays is put into two bags called the supervised learning and unsupervised learning.

We will be discussing about supervised learning in our post today  .  A brief introduction description and difference between the is two is as follows .

Say you are given a dataset containing height ,weight ,length of hair of volunteers and you need to determine whether the particular volunteer is a male or female basically you want to determine the gender.

In supervised learning the height weight and length of hair value of each volunteer is  labelled with their individual gender and this data can be used to formulate a  formula to determine gender. This formula or model can be later used to predict whether a particular person is male or female.

Whereas in unsupervised learning we do not have labels to identify group into their genders but what the can generalise is that there are two groups that show similar traits and hence should belong to the same category we can give them labels such as FIRST or SECOND .

Supervised Machine learning is broadly divided into two functions. The first one being the classification analysis and second one being regression analysis. As you might have intuitively figured out classification refers to an act of distinguishing between various object or entities by labelling them with name or some unique ID in order to segregate the similar entities together for easy recognition later in the future.

Regression is more of predicting continuous values . Here we are not trying to classify anything but rather trying to find future values say the stock prices or commodity prices. We wont be talking in depth about this in this post but in the next we will.


CLASSIFICATION LEARNING

 

There various application of machine learning for classification are , distinguishing between a plant and an animal in a picture using image processing techniques. In financial  Industries it is very essential to assess whether a particular investment is bad or good. The banking industry uses classification to categorise people who are likely to default on  Loans and people who will pay the debt on time. This is one aspect of machine which practically employed in our world today.

Today we will be dealing with classification of iris plant into 3 different categories.


 

DATA SET

The Iris data set is a data set available on the UCI machine learning repository website and here you can find different types of datasets to work around with and learn about machine learning.

The IRIS data set contains 3 classes of 50 instances each, where each class refers to a type iris plant.
IRIS data set and information can be found on this particular link .Iris Dataset.

These are the three categories of iris plant that we will try to classify and they are as follows.

IRIS SETOSA

IRIS VERSICOLOUR 

IRIS VERGINICA

Notice how there is only very subtle difference between the different categories . This is the drawback of human vision that can be overcome using machine learning processes.


GETTING STARTED IN PYTHON

we will be dealing with just basic visualization and very basic classifier using python to classify between iris setosa,iris versicolour,iris virginica.

For the following are libraries are needed to be installed.

If you are not able to install the packages properly check out this particular video to get you through to the end stages.

Also you can check out this websites to download Windows Binaries on this website Python Extensions. If you face any difficulty please drop a comment below the blog .


CODE SNIPPETS AND EXPLAINATION

First we import pyplot from the matplotlib and then import the iris data set from the sklearn library  or scikit-learn library or module whatever you might wanna call it .We also import numpy module.


from matplotlib import pyplot as plt
from sklearn.datasets import  load_iris
import numpy as np 
import pandas as pd

now that all modules have been printed. we will load the data into a variable called data.

loaded=load_iris()
features=loaded["data"]
feature_names = loaded["feature_names"]
labels=loaded["target"]

now we create a pandas dataframe for easy manipulation later while plotting various exploratory graphs.

features=pd.DataFrame(features)
labels=pd.DataFrame(labels)

now we join the two data frames by adding another column in features and adding values for labels in it.

features[4]=labels

load_iris is a class in sklearn datasets.
inorder to access the variable attributes of the class we use the object loaded.
we assign the various features of the data by intialising a variable called features and feature names were assigned to the variable feature_names.labels are the categories to be predicted in i.e. versicolour ,setosa and verginica.

 

now for basic plotting we use the plt method of the pyplot module

for i,color in zip(range(3),"rbg"):
     plt.scatter(features[features[4]==i][0],features[features[4]==i][1],c=color[i],marker="o")  

show()

what this code snipped does is ,it plots the sepal length vs sepal width value of different labels ¬†and the color is also governed by the labels. This helps us in distinguishing which attribute can be used to separate the categories of the iris plant .One of the plots obtained is shown here. You dont really need to understand the code coz its just for exploratory purposes. If you want in depth explanation leave a request in the comments section . I’ll be obliged to do solve any queries .

X-axis is the is the Sepal length and Y-axis denotes the Petal length.

As can be seen from the plot of Sepal length vs Petal length that there is marked difference between Iris setosa and other two categories of iris plant and this can be used to form model which based on  the attributes sepal length and Petal length . Just by looking at the scatter plot  itself you can form a rule for yourself that if the sepal length is less than 6.0 and if the petal length is less than 2.5 the plant can be classified under setosa which is denoted by red dots in the plot. The red lines are called decision boundaries as shown below.figure_1

Now this is a very crude way of carrying out classification , but what if we have multiple properties or features which are needed to be used . What if we want to class all the three classes differently rather than just between setosa and non-setosa .

In such a situation we can use the SVM classifier. SVM stands for support vector machine .What support vector machine does is create decision boundaries for you automatically and if you cannot separate two categories like the versicolor and verginica  in our case , It transforms there features into alternate feature using kernel function. I known its a heavy word kernel function but all it means it is performs certain operations on these features so that the new features are separable linearly by straight decision boundaries. In a way the transformation function takes these features into alternate Dimensions. These decision boundaries might not be straight when brought back to their original dimensions. This video shows svm in action .

Another beautiful example of decision boundries can be seen on a zoo map where different species is separated by different path.

 


 

                       CODE SNIPPET FOR SVM CLASSIFICATION 

from sklearn import svm
clf=svm.SVC()
clf.fit(features,labels)

what this code does is initialise a clf named object which is an object of the class svm. The fit method in this class is used to fit a model on the features and the labels. This operation of model fitting is done behind the scene and what you end up with is clf object which can be used to classify the iris plant if given the exact features which were provided while it was fitting the model.

 now lets try to predict using clf classifier object that we generated .For that we need features of the iris plant to be classified and we generate our own feature data by looking at the feature data used.

Untitled

class 0 is setosa class and lets create our own feature set  as a variable and give the name unknown. unknown has 2 features set of two different plant which I made after manipulating the value of the feature set we have on hand.


unknown=[[5.1,3.2,1.5,0.15],[4.6,3.0,1.6,0.3]]

now its time to predict . we use the method predict in the SVM module to predict.


clf.predict(unkown)

output


array([0, 0])

output concurs with what we expected  output is array of size two. With each array element denoting setosa class which enumerated as 0. It is that easy to derive prediction out of classifier we built .

you are now  equipped with all the necessary knowledge to get you started with the basic level prediction . Do check out different type classifier and how they work .

and here is the whole code in one single snippet

from matplotlib import pyplot as plt
from sklearn.datasets import load_iris
import numpy as np
import pandas as pd
from sklearn import svm

laoded=load_iris()
features=loaded["data"]
feature_names = loaded["feature_names";]
labels=loaded["target"]

"""now we create a pandas dataframe
for easy manipulation later while plotting various exploratory graphs."""

features=pd.DataFrame(features)
labels=pd.DataFrame(labels)

"""now we join the two data frames by adding another column in features
and adding values for labels in it."""

features[4]=labels

for i,color in zip(range(3),"rbg"):
plt.scatter(features[features[4]==i][0],features[features[4]==i][1],c=color[i],marker="o")

show()

#modeling the support vector machine

clf=svm.SVC()
clf.fit(features,labels)

#now its time for prediction

unknown=[[5.1,3.2,1.5,0.15],[4.6,3.0,1.6,0.3]]

#predict
print clf.predict(unkown)

I hope you liked this post and you are really angry about how long this blog post was. So heres a perfunctory GIF to demonstrate how exhausting writing a blog is .

DONT FORGET TO FOLLOW! ūüôā

13 energy data startups to watch in 2013

Energy is one thing that need to be conserved and to be used frugally . Here are some startups and soon to be big companies that employ big data and machine learning methods to help you to achieve just that.

Gigaom

Dozens of startups building analytics that can collect and analyze energy data emerged or grew their businesses in 2012. It was a hot trend for a variety of reasons including the development of big data technologies as a massive and growing business, the notion that energy analytics are a more attractive capital-lite business than smart grid hardware, and the reality that utilities need to digitize their power grids to provide better service in the modern age.

Here’s a list of 13 startups that we covered last year that I would keep an eye on in 2013. A couple of these, like Opower and Nest, I also included on my list last year:

1). Stem: Formerly called Powergetics and founded in 2009, Stem‚Äės software tracks and analyzes energy use in buildings and helps companies predict and control their energy budgets. In addition, the company helps buildings owners tap into installed‚Ķ

View original post 690 more words

Its you again DATA STRUCTURES!

hey there! ohk lets start by saying this .

I did try to make a video this time around about list comprehensions but I goofed up and I really need to improve upon my video editing skills before I go through posting any of it online ,so I might have to wait a while to put in a Vlog . I am going to continue where I left. List and working with list.

Till now I have covered how to initialise and work with list. Now some basic operations that list comes . These operations are called methods. getting back to the same example I told you guys before assume that you are going to a market place or Lidl or your local grocery store . So you make a list of things you want buy when you get there . Something similar to what we have here.

So you initialise a list name shopping_list

Shopping_list=[“milk”, “butter”,”Cheddar cheese”, “plain yougurt”]

and now you want to carry out following operations to your list say

  • Add a product to the list
  • Add multiple products to the list
  • Remove a particular product from the list
  • or maybe for some god knows reason why … You want the index of a particular product .. You know some people who just have these weird habits or maybe you have OCD . I dont know its your problem. Deal with it!

lets address these issues . Not the OCD one . You still gotta deal with that yourself.


 

ADDING a (single) PRODUCT to the list

ohk so what you do is use a simple method called append using the following syntax

shopping_list.append(“A pack of beer”) ¬† ¬†¬†

what this does is add the item “A pack of beer” to the end of the list. So when you try to print out your shopping_list. Using¬†print shopping_list¬† command. what you get is this.

[“milk”, “butter”,”Cheddar cheese”, “plain yougurt”,”A pack of beer”]¬†

someone is having a party tonight .what if you want to add multiple items


 

ADDING multiple products to the shopping_list .

Now if you want to add multiple items . say haribo and ice creams ,salamis and so on . you can use the following method called extend

so¬†shopping_list.extend([“ice cream”,”salami”,”haribos”]) ¬†Does the job for you .It will add it to the end of the these following items. Now remember parameter that you pass to extend have to in the form of an iterable or for now assume it to be same as a list.

now your list looks like this .

[“milk”, “butter”,”Cheddar cheese”, “plain yougurt”,”A pack of beer”,“ice cream”,”salami”,”haribos”]¬†


 

REMOVING a Product from the list.

You know those days you dream of having a lavish life and then you to peek into your wallet

 

and you are like .

Yeah you gotta remove those beer can .. sorry man.

so how do you go about it . you use the method called remove and it takes in as argument or parameter the particular element you want remove. so the exact value or item name is needed to be specified.so,

shopping_list.remove(“A pack of beer”)

this result in your shopping list looking like this

>>> [“milk”, “butter”,”Cheddar cheese”, “plain yougurt”,”A pack of beer“,“ice cream”,”salami”,”haribos”]¬†

it had to be done bro!!! sorry !

now there another method you can call to remove elements either from the end of the list or if you know the index.

shopping_list.pop()  removes the last element of the list . so there goes your haribos.

and if you know the index say you know you wrote down butter as the second element of the list. you use

shopping_list.pop(1) ¬†second element =index 1 rememeber . which removes your butter from the list …that cholesterol aint gonna go down by itself.


 

FINDING INDEX of a particular product.

now for the retarded part .

if you wan to know at what index the particular item appears you can use.

shopping_list.index(“salami”)¬†outputs the index number of where salami appears in the list.

by now you must be like SHUT UP ALREADY !!

 

so I’ll also drop the chalk over here and move to suggesting you to go through this particular exercise of codeacademy to build you basics and get complete grasp over the subject . I sincerely request you guys to through this PARTICULAR TUTORIAL(this is hyperlinked)¬†where they teach you how to traverse through the list . and access each element . ¬†I know I haven’t really covered much about machine learning but this is important as this is the base on which you build so .. check this particular link out . Next post I’ll start with pandas and something called as CSV files .. so keep up .

 

 

 

 

Knock Knock ? Who’s there ? Data Structures.. :|

Hello!!!

As promised today Im dropping my post about data structures available in python.

For those of you who are not familiar with concept of data structures. Basically a Data Structure defines the way our data is stored so that it can be accessed efficiently.

AND good machine learning happens only when we have good quality data to create our features from. If you dont know what feature is , stick to this blog I will be posting about it later .

As for the data scientist out here . Remember most of the data scientist spend 70-80% of their time working to sort data.

Now the reality is that most of the data that is generated nowadays is found to be unstructured. This is similar to working at a store which looks something like this.

Lovely little shrine for bad quality  unstructured data

Will you be ready to assist someone in finding a product of their choice in such mess?   HELLL NOOO!!!.

Aint-Nobody-Got-Time-for-That

So what you do is organise your data in very structured and thought off manner ,for easy access in the future and end up with something like this.

Praise the lord!!!
Praise the lord!!!

Someone’s job just got a lot easier.
As a machine learner or a someone who wants to be a DATA SCIENTIST.  YOU GOT TO STRUCTURE YOUR DATA. Now what kind of structure you use completely depends on you and how you want to access it . If speed is the need or space is the problem ,you might wanna try different structure similar to a store which might try a circular setup like shown here.candy-store-setup

 

what are our options? well when you talk about storing data in python the first thing that comes in our mind is list. I thought of creating a vlog about list but I’m kinda short on time so but I will put up a Vlog regarding the ¬†list operations or comprehensions as we call them. Till then check out this introductory video by KHAN ACADEMY . He explains the nuances about the concept of List pretty concisely .

Now that you are familiar with concept of List . I will keep this post short . Next post I will be covering about list comprehensions that’ll be tomorrow or today ¬†depending upon whenever I get the time. after learning about List comprehensions you should be able to script most of the algorithmic or Data Munging(Cleaning) requirements . So please keep up ūüôā ¬†. This will be followed by post about dictionary and sets .Never really used sets for Machine Learning but no harm knowing about it .

And sign UP if you dont want to miss out on the next post.