[Tutor] Python Networkx with file in gexf format

Alan Gauld alan.gauld at yahoo.co.uk
Tue Jun 2 19:27:24 EDT 2020


On 02/06/2020 19:38, Daniel Wobmann wrote:

> I'm doing a continuing education, with Python being a relatively large part of it.

So are you learning python? Or learning something else in which
Python is used? From your message it sounds like the latter?

> We are not allowed to use Gephi. 

Lost me. Never heard of Gephi...

> Everything must be analyzed and derived in Python. 

OK and are they teaching you Python? If not it would seem a tad harsh!

> I have a dataset in gexf format

Nope! never heard of that either.

>  about the nodes "students" (student-ID) and "teachers" (teacher-ID), 
> where each node belongs to a school class and has a corresponding gender;...
> The edges connect the pupil-ID by means of "Origin" and "Destination". 

This all sounds like graph theory. Are you competent in graphs and
the associated math? Or is that part of your course?

> If someone can help me with this, I would send him/her the dataset by mail. 
> And of course I would pay something for the work.

You can of course make any commercial arrangements you wish but
that's not what this list is for. You ask questions about
programming/python/the standard library and we answer them.

> Now I wish to read out various information from this dataset with 
> Python and the package networkx 

Nope, lost me again. Never heard of it.

> - and above all to display it graphically. 

There are several plotting packages in Python. GNUplot is popular.

> This causes me many difficulties, because in python I want to work > with the nodes / edges in the gexf document; but also with the values
per item

It is normal programming practice when dealing with files (of any
format) to read them into memory using a data structure best suited
to the problem. Modified data can be written back to the files (in
the original format) when finished.

Its an exercise in frustration to try to process data in a sub-optimal
data structure intended for data storage rather than manipulation.

> 1. how can I find out from which data type a feature is? How can I 
> convert the datatype of a feature for example from String to Int (preprocessing engineering)?

The usual python approach is to use the int() type conversion.
eg

int("16") returns the number 16.

> 2. how can I calculate the number of edges per node (1 origin and how many targets?)? 

This is more about the choice of algorithm which is a math question not
strictly a Python one. Although the networks package probably has a
predefined algorithm you can use, but you should probably ask their
support forum about that.It is not part of the standard python library.


> 4. develop and display the graph for the whole dataset What does the code for this look like?

It will depend on how you choose to store the data. It may also depend
on what the networkx package offers. Again a question for their forum.

> 5. how can I display graphs, i.e. connections and nodes of a single 
> school class (clusters?), single nodes (students) of the same class, etc. with different colors?> What does the code for this look like?

See 4 above plus the reference to plotting packages...

> 6. subgraphs: Are they parts of a whole graph, as I understood it,

That sounds like a question for your course tutors. Its what
they are there for.

> 7. are subgraphs also called "subgroups" and "clusters"? 

As above.

> 8 How can I determine whether or which student is an "influencer" in the class 

Again this has little to do with python and more to do with
your course theory. I suggest you ask the tutors.

> 9. how can I remove items. For example, because they represent an outlier? 

That will depend on the data structures. Probably a networkx thing again.

> 10. weight: I have read a lot about it, but I have not been able to figure 
> out what it is and what purpose it serves. What does this tell me? What can I do with it? 

Again this is not Python this is your course theory.
Ask your tutors, its their job.

> 11. How to calculate the following? For example, is this calculated per student ...
> - Degree Centrality
> - Betweenness-Zentralität)
> - Closeness Central Office
> - Prestige Indegree
> - Ego Network

Again not python. Ask the tutors. Or even your fellow students?

> 12. Link Predictions? I heard that there are ways to use different models (or algorithms?) 
> to predict what other possible connections between nodes or the students (in our case) might 
> look like. For example Jaccard, Common Neighbours, Preferential Attachment, 
> Resource Allocation, etc. What does the code for these look like?

These are all more about the math than python.
The python code will totally depend on the data structures used.
Your networkx package may have some of it pre-coded for you. ask them.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




More information about the Tutor mailing list