Class Diagram with GEF

rschumm wrote on Fri Nov 18 11:06:35 CET 2005:
Has somebody already had the idea to build a simple class diagram displaying
for Perl Classes with GEF? 
We have a little Perl script here that parses class files and generates
a Graphviz directed graph. 
It would by much cooler to have this integrated in Eclipse with GEF. So
we would have round-trip engineering - ok, just the half turn. 
jploski wrote on Fri Nov 18 19:25:45 CET 2005:
I don't know about GEF, but something like Java's Type Hierarchy view would
be definitely nice to have.

Apropos parsing Perl source: I have been reworking EPIC to parse code more
accurately and faster for quite a few weekends now. I chose ANTLR instead
of the hand-crafted regexps on which it has relied so far. However, I am
nearing the (late) conclusion that parsing accurately enough in real time
for fast and correct syntax highlighting and to support features like "Open
SUB Declaration" and code completion is close to impossible.

I also briefly looked at the famed PPI module (there is talk that it will
be integrated into Komodo?), but it takes 8 seconds to process my sample
large file on a fast machine. Even caching the result (which you can IMHO
hardly do on an edited document) brings it down to just 1.5 seconds. So
PPI seems unacceptable as support for syntax highlighting (with ANTLR I
get something like 300ms, however with a lower quality of results).

Anyway, PPI might be interesting for less interactive things like refactoring
or creating accurate class hierarchies. Maybe you should have a look at
it.

Currently I am hesitating whether to check in my latest and "greatest" ANTLR-based
code to CVS, mostly because I am unsure whether other EPIC devs could live
with it. Not that there is much going on, but one should not take architectural
changes like this one too lightly.
rschumm wrote on Tue Nov 22 10:38:19 CET 2005:
I had a discussion with some students here on Saturday who built an Entity
Relationship Editor for Eclipse based on GEF / Graphical Editing Framework:

http://sourceforge.net/project/screenshots.php?group_id=143165
As I understood, GEF provides a Broadcaster/Listener Architecture to plug
in a DataModel of the Graph of the Code (that optionally can be build on
EMF / Eclipse Modelling Framework). All Changes to the DataModel will be
propagated to GEF, i.e. the Diagram. 
You already have the Informations about Classes and Methods (alas not the
member Variables, because Perl is to dynamic to predict the members before
runtime) somewhere in your Model that makes the Outline View. 
With a grep or your new ANTLR-code it should be easy to also have the "use
base" informations. 
With that Model we could build a nice graphical Class Diagramm (or maybe
Class Hierarchy View). 
But the point is: I absolutely don't know when I should do this... (it would
be nice, by the way, to write some Java Code for change... I miss it. :-)
) I never have written anything for the Eclipse Platform. 
And the thing with OpenSubDeclaration and Refactoring: I agree with you,
that this one will be very, very difficult (or even impossible?) with Perl,
because of is type-lessness, odd syntax and dynamics. I've read non-ending
discussions about this but I am not experienced enough to make my own accurate
opinion... :-( 
I crossed PPI some days ago, but I really don't know if it is powerful enough
to do a "move method" or "change method signature"...?  
jploski wrote on Tue Nov 22 19:55:48 CET 2005:
I have also written a GEF-based editor for presenting (Java class and package)
dependency diagrams two years ago or so. It was quite fun and easy. The
major part of work was implementing a graph layout algorithm because there
was nothing suitable in GEF back then (I don't know about today).

The Perl model you mentioned only exists in a rudimentary form for the currently
edited file. The way it was implemented originally (and can be seen in the
current CVS code), the whole source file is simply grepped through during
inactivity, and the results of this operation are directly fed into the
Outline view. Note that it also spots the "use" directives found in the
file. Folding is implemented on top of the "bracket matching" code, which
in turn uses information generated by syntax highlighting. The desirable
model-view separation is hardly present - it's all view and very little
of the actual "model". But it mostly works and can be certainly admired
for that reason.

I am currently attempting to replace the current "smoke & mirrors" architecture
with something more coherent, like parse once and generate a useful model
for multiple purposes. In fact, "parse once" is not sufficient, reparsing
must happen incrementally and quickly on each change. Some edits cause major
problems performance-wise: for example opening a POD comment at the beginning
of a huge file or entering a curly brace would both trigger reparsing the
whole file. This stuff is quite challenging to implement acceptably. The
benefits would be that syntax highlighting, bracket matching, outline etc.
could all be implemented in terms of the single model rather than by depending
on each other in mysterious ways.

I started small by trying to fix the Open SUB Declaration feature. However,
as I went further I experienced an avalanche effect - one change led to
another and now I am on the verge of replacing the whole cbg.editor plug-in
on which EPIC heavily depends with something new and tailored to Perl's
idiosyncracies. These are big, non-incremental changes, not consulted with
anyone and not risk-free. So far the results show promise, yet I would not
hold my breath waiting for a release.

The class hierarchy/diagram you envision would require collecting information
from multiple source files, a whole project and beyond. I think it would
be reasonably easy to gather the information in a dedicated pass over the
files (contradicting my central model idea), but it would be challenging
to achieve any round-trip functionality. Also note that class hierarchies
can be expressed by @ISA, not necessarily by use base. Furthermore, the
use statements can appear anywhere (for example inside of conditional branches).
These things can be just as dynamic as variables in principle. Usually -
in most cases? - they resemble static declarations, which makes it worthwhile
to consider features such as you described.

Apart from the parse-and-collect-information approach it might be worthwhile
to consider gathering information dynamically: just invoke an instance of
Perl interpreter, let it execute "use XYZ" and query whatever you like.
This approach is already taken by EPIC sometimes (for example, in code autocompletion).
It is quite appealing to leverage the Perl interpreter for introspection
rather than reinvent the wheel yourself. The major limiting factor is the
performance overhead, but it should not matter much for your feature. I'd
say, if you enjoy it, go for it.

As far as refactoring goes, the usual argument I hear is "refactoring originated
from Smalltalk, which is also dynamically typed, so it is possible". I don't
know Smalltalk nor its refactoring tools; they supposedly consult the user
to resolve ambiguities. My current view is that if one wanted to implement
something like that for Perl, PPI would be a good start because it provides
as accurate information as possible for a single file. Beyond that much
is unclear - PPI does not even pretend to do cross-module analysis or type
inferencing.

From my perspective, the fact that no popular refactoring tools exist for
Perl (and C++?) after so many years speaks against these languages. But
then, one does not always have the freedom to choose.
matisse wrote on Tue Nov 22 20:20:49 CET 2005:
Jan, I am so happy to see your posting - I admire your approach which is
both thoughtful and pragmatic.

If we can add even a few refactoring ioperations  that is progress - we
have one ("extract subroutine") now, and having more will only help. I suggest
adding whatver is easy to add - that may attract more users and developers,
and thus, more overall brainpower.

Eventually, Perl6 might make all this easier, but until then, adding a little
at a time is a Good Thing.
(See my article on Perl needing better tools: http://www.perl.com/pub/a/2005/08/25/tools.html)
rschumm wrote on Mon Dec 12 11:11:04 CET 2005:
Yes, the graph layout algorithm might be the biggest problem. I don't know
yet if there is something suitable in GEF. If there is no, there is no use
to write such a class diagram plugin. 

Furthermore, as you say, it would be difficult to integrate a model over
all files of a project. 

Nevertheless, I have realized that this project is a big number to big for
me. I'd had to invest such an amount of time that I suppose it'd be easier
to persuade my company to stop using Perl in favour of Java. ;-) And that's
even a better perspective. 

Surfing around on Eclipse I found this new Project: http://www.eclipse.org/proposals/dltk/
They seem to have the same idea, but through a totally different approach.
I' ve not read through these things carefully, but I doubt that the are
able to face all this problems with the dynamism and awful syntax of Perl
we discussed. 

Nevertheless: I will of course continue using e-p-i-c all day because there
still is a lot of Perl Code here. 

If you're interested I could post our little Perl Script that generates
a Graphviz Graph Classdiagramm with inheritance an method names. It's a
very simple approach but very useful. 
adamkennedy wrote on Fri Dec 16 03:33:13 CET 2005:
To summarise the PPI situation...

1. It was never intended for real-time work.

There are so many cascading problems parsing Perl code I took the attitude
that it was more important to be right than fast. In fact, there are some
problems that are provably impossible and chasing them becomes an exercise
in folly.

That said, it has only this week been completed to the point of parsing
random line noise, so as far as Make It Work go we're done.

Which means Make It Fast is only just getting started. Particularly for
large documents (go read Genezzo::Parse::SQL) it is most definitely slow.

2. PPI is usable for background processing

Even the Perl interpreter is useless for real-time, and so PPI is intended
only for use in background processing.

3. PPI does not implement cross-method analysis.

PPI doesn't implement any analysis at all, it is a parser and only a parser.
However, it does provide the necesary base for writing these sorts of code.

See things like Perl::Metrics, PPIx::Analyze and so on.

As for type inferencing, you should know as well as I Perl doesn't support
types, and any attempt to try in a universal way would be an exercise in
folly.

I've been slowly working towards a proof of concept of what a "refactoring
editor" might actually look like, and the things that can be done for Perl
might not look like you are used to in Eclipse. But we'll see...


matisse wrote on Fri Dec 16 04:35:38 CET 2005:
I am very gald to see you weigh in here, and everything you posted makes
sense to me.

In January I expect to be working with Jeff Thalhammer ,
who wrote "perlcritic", which uses PPI. Maybe we can help make PPI faster,
and/or help add more refactoring features to EPIC.
jploski wrote on Fri Dec 16 08:40:23 CET 2005:
Adam,

Thanks for your insights. While I agree with the "make it work"/"make it
fast" approach in general, I realise that in some situations "make it work
fast (for some people)" might be even more preferable ;-).

My impression is that tools for inconvenient languages like Perl, C++ don't
progress because people draw the conclusion that "if you know you can't
make it work in general, don't bother to try". PPI is a counterexample,
yet I think the "folly"-related pieces of your comment might reflect some
of that attitude.

It is not a folly to try making a real-time Perl parser which works better
than present tools - which also means faster - when applied to my and my
co-workers' code (a limited subset of Perl indeed). On the other hand, I
do not care as much about it being able to parse line noise or a hand-crafted
sample which proves that "it cannot work".

Having said that, I would be very interested in how much you can speed up
PPI in the coming next phase of development. Also, thanks for getting PPI
out in the first place. Regardless of its suitability for one purpose or
another, it undeniably helps us better estimate the challenge. It can also
act as an excellent correctness benchmark for other aspiring Perl parsers.

Regards -
JPL
adamkennedy wrote on Sun Dec 25 13:31:12 CET 2005:
Jan

The attitude is born of many MANY wounds and scars, the result of years
of failure on the part of many brilliant people.

In fact, until I got my Perl Foundation grant, pretty much the entire Perl
community was convinced I was nuts for thinking I had a way around the problems.

I do think your parser will be needed. Any sufficiently useful Perl editor
will need three parser.

1. Perl itself for things like test runs, debugging and other functions
that are worth the dangers.
2. PPI or the like (syntactically-thorough) for safety and completeness
when doing significant tasks (especially when modifying files).

3. A real-time parser for (at the very least) syntax highlighting and other
tasks that don't alter the code.

If you are working on the last of these good luck, and feel free to ask
questions. There are many things you can learn from the ways PPI does things.

But you REALLY need to pick your battles when attacking the syntax. Parsing
Perl is a fractally-complex problem, every time you think you've solved
something, you find 3-4 more problems.

Some of the problems with Perl's syntax will kick your ass if you try to
beat them comprehensively. Other problems it is certainly possible to make
some headway against.

But reading your description brought back the tilting-at-windmills feelings
I remember from when I tried to do the same thing 3-4 years ago.

So I guess my advice is "be careful when tilting at windmills".

As for making PPI faster, there's a few different directions to go in (take
a look at PPI::XS as a good starting point is you know any XS) and when
you get some time to look into it, I'll be happy to advise and assist :)

Note: The above is an archived snapshot of a forum thread. Use the original thread at sf.net to post comments.